probabilisticanalysisofadifferentialequation ...1. introduction...
TRANSCRIPT
http://www.elsevier.com/locate/jcoJournal of Complexity 19 (2003) 474–510
Probabilistic analysis of a differential equationfor linear programming
Asa Ben-Hur,a,b Joshua Feinberg,c,d,� Shmuel Fishman,d,e andHava T. Siegelmannf
aBiochemistry Department, Stanford University, Stanford, CA 94305, USAbFaculty of Industrial Engineering and Management, Technion, Haifa 32000, Israel
cPhysics Department, University of Haifa at Oranim, Tivon 36006, IsraeldPhysics Department, Technion, Israel Institute of Technology, Haifa 32000, Israel
e Institute for Theoretical Physics, University of California, Santa Barbara, CA 93106, USAfLaboratory of Bio-computation, Department of Computer Science, University of Massachusetts at Amherst,
Amherst, MA 01003, USA
Received 29 October 2001; revised 1 August 2002; accepted 12 March 2003
Abstract
In this paper we address the complexity of solving linear programming problems with a set
of differential equations that converge to a fixed point that represents the optimal solution.
Assuming a probabilistic model, where the inputs are i.i.d. Gaussian variables, we compute the
distribution of the convergence rate to the attracting fixed point. Using the framework of
Random Matrix Theory, we derive a simple expression for this distribution in the asymptotic
limit of large problem size. In this limit, we find the surprising result that the distribution of
the convergence rate is a scaling function of a single variable. This scaling variable combines
the convergence rate with the problem size (i.e., the number of variables and the number of
constraints). We also estimate numerically the distribution of the computation time to an
approximate solution, which is the time required to reach a vicinity of the attracting fixed
point. We find that it is also a scaling function. Using the problem size dependence of the
distribution functions, we derive high probability bounds on the convergence rates and on the
computation times to the approximate solution.
r 2003 Elsevier Science (USA). All rights reserved.
Keywords: Theory of Analog Computation; Dynamical systems; Linear programming; Scaling; Random
Matrix Theory
ARTICLE IN PRESS
�Corresponding author. Physics Department, University of Haifa at Oranim, 36006 Tivon, Israel.
E-mail addresses: [email protected] (A. Ben-Hur), [email protected]
(J. Feinberg), [email protected] (S. Fishman), [email protected] (H.T. Siegelmann).
0885-064X/03/$ - see front matter r 2003 Elsevier Science (USA). All rights reserved.
doi:10.1016/S0885-064X(03)00032-3
1. Introduction
In recent years scientists have developed new approaches to computation, some ofthem based on continuous time analog systems. Analog VLSI devices, that are oftendescribed by differential equations, have applications in the fields of signalprocessing and optimization. Many of these devices are implementations of neuralnetworks [12,19,20], or the so-called neuromorphic systems [21] which are hardwaredevices whose structure is directly motivated by the workings of the brain. Inaddition there is an increasing number of algorithms based on differential equationsthat solve problems such as sorting [10], linear programming [14] and algebraicproblems such as singular value decomposition and finding of eigenvectors (see [18]and references therein). On a more theoretical level, differential equations are knownto simulate Turing machines [8]. The standard theory of computation andcomputational complexity [24] deals with computation in discrete time and in adiscrete configuration space, and is inadequate for the description of such systems.This work may prove useful in the analysis and comparison of analog computationaldevices (see e.g. [11,20]).In a recent paper we have proposed a framework of analog computation based on
ODEs that converge exponentially to fixed points [5]. In such systems it is natural toconsider the attracting fixed point as the output. The input can be modeled in variousways. One possible choice is the initial condition. This is appropriate when the aim ofthe computation is to decide to which attractor out of many possible ones the systemflows (see [28]). The main problem within this approach is related to initialconditions in the vicinity of basin boundaries. The flow in the vicinity of theboundary is slow, resulting in very long computation times. Here, as in [5] theparameters on which the vector field depends are the input, and the initial conditionis part of the algorithm. This modeling is natural for optimization problems, whereone wishes to find extrema of some function EðxÞ; e.g. by a gradient flow ’x ¼grad EðxÞ: An instance of the optimization problem is specified by the parameters ofEðxÞ; i.e. by the parameters of the vector field.The basic entity in our model of analog computation is a set of ODEs
dx
dt¼ FðxÞ; ð1Þ
where x is an n-dimensional vector, and F is an n-dimensional smooth vector field,which converges exponentially to a fixed point. Eq. (1) solves a computationalproblem as follows: Given an instance of the problem, the parameters of the vectorfield F are set, and it is started from some pre-determined initial condition. Theresult of the computation is then deduced from the fixed point that the systemapproaches.Even though the computation happens in a real configuration space, this model
can be considered as either a model with real inputs, as for example the BSS model[7], or as a model with integer or rational inputs, depending what types of values theinitial conditions are given. In [5] it was argued that the time complexity in a largeclass of ODEs is the physical time that is the time parameter of the system. The
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 475
initial condition there was assumed to be integer or rational. In the present paper, onthe other hand, we consider real inputs. More specifically, we will analyze thecomplexity of a flow for linear programming (LP) introduced in [14]. In the realnumber model the complexity of solving LP with interior point methods isunbounded [31], and a similar phenomenon occurs for the flow we analyze here. Toobtain finite computation times one can either measure the computation time interms of a condition number as in [25], or impose a distribution over the set of LPinstances. Many of the probabilistic models used to study the performance of thesimplex algorithm and interior point methods assume a Gaussian distribution of thedata [27,29,30], and we adopt this assumption for our model. Recall that the worst-case bound for the simplex algorithm is exponential whereas some of theprobabilistic bounds are quadratic [27].
Two types of probabilistic analysis were carried out in the LP literature: averagecase and ‘‘high probability’’ behavior [4,33,34]. A high probability analysis providesa bound on the computation time that holds with probability 1 as the problem sizegoes to infinity [34]. In a worst-case analysis interior point methods generally require
Oðffiffiffin
pj log ejÞ iterations to compute the cost function with e-precision, where n is the
number of variables [33]. The high probability analysis essentially sets a limit on the
required precision and yields Oðffiffiffin
plog nÞ behavior [34]. However, the number of
iterations has to be multiplied by the complexity of each iteration which is Oðn3Þ;resulting in an overall complexity Oðn3:5 log nÞ in the high probability model [33]. Thesame factor per iteration appears in the average case analysis as well [4].
In contrast, in our model of analog computation, the computation time is thephysical time required by a hardware implementation of the vector field FðxÞ toconverge to the attracting fixed point. We need neither to follow the flow step-wisenor to calculate the vector field FðxÞ since it is assumed to be realized in hardwareand does not require repetitive digital approximations. As a result, the complexity of
analog processes does not include the Oðn3Þ term as above, and in particular it islower than the digital complexity of interior point methods. In this set-up weconjecture, based on numerical calculations, that the flow analyzed in this paper hascomplexity Oðn log nÞ on average and with high probability. This is higher than thenumber of iterations of state of the art interior point methods, but lower than the
overall complexity Oðn3:5 log nÞ of the high probability estimate mentioned above,which includes the complexity of an individual operation.
In this paper we consider a flow for linear programming proposed by Faybusovich[14], for which FðxÞ is given by (4). Substituting (4) into the general equation (1) weobtain (5), which realizes the Faybusovich algorithm for LP. We consider real inputsthat are drawn from a Gaussian probability distribution. For any feasible instance ofthe LP problem, the flow converges to the solution. We consider the question: Giventhe probability distribution of LP instances, what is the probability distribution ofthe convergence rates to the solution? The convergence rate measures the asymptoticcomputation time: the time to reach an e vicinity of the attractor, where e isarbitrarily small. The main result of this paper, as stated in Theorem 4.1, is thatwith high probability and on the average, the asymptotic computation time is
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510476
Oðffiffiffin
pj log ejÞ; where n is the problem size and e is the required precision (see also
Corollary 5.1).In practice, the solution to arbitrary precision is not always required, and one may
need to know only whether the flow (1) or (5) has reached the vicinity of the optimalvertex, or which vertex out of a given set of vertices will be approached by thesystem. Thus, the non-asymptotic behavior of the flow needs to be considered [5]. Inthis case, only a heuristic estimate of the computation time is presented, and inSection 6 we conjecture that the associated complexity is Oðn log nÞ; as mentionedabove.The rest of the paper is organized as follows: In Section 2, the Faybusovich flow is
presented along with an expression for its convergence rate. The probabilisticensemble of the LP instances is presented in Section 3. The distribution of theconvergence rate of this flow is calculated analytically in the framework of randommatrix theory (RMT) in Section 4. In Section 5, we introduce the concept of ‘‘high-probability behavior’’ and use the results of Section 4 to quantify the high-probability behavior of our probabilistic model. In Section 6, we provide measuresof complexity when precision asymptotic in e is not required. Some of the results inSections 6.2–8 are heuristic, supported by numerical evidence. The structure of thedistribution functions of parameters that control the convergence is described inSection 7 and its numerical verification is presented in Section 8. Finally, the resultsof this work and their possible implications are discussed in Section 9. Sometechnical details are relegated to the appendices. Appendix A contains more detailsof the Faybusovich flow. Appendix B exposes the details of the analytical calcula-tion of the results presented in Section 4, and Appendix C contains the necessarydetails of random matrix theory relevant for that calculation.
2. A flow for linear programming
We begin with the definition of the linear programming problem (LP) and a vectorfield for solving it introduced by Faybusovich in [14]. The standard form of LP is tofind
maxfcT x: xARn; Ax ¼ b; xX0g; ð2Þ
where cARn; bARm; AARmn and mpn: The set generated by the constraints in (2) isa polyhedron. If a bounded optimal solution exists, it is obtained at one of itsvertices. Let BCf1;y; ng; jBj ¼ m; and N ¼ f1;y; ng\B; and denote by xB thecoordinates with indices from B; and by AB; the m m matrix whose columns arethe columns of A with indices from B: A vertex of the LP problem is defined by a setof indices B; which is called a basic set, if
xB ¼ A1B bX0: ð3Þ
The components of a vertex are xB that satisfy (3), and xN ¼ 0: The set N is thencalled a non-basic set. Given a vector field that converges to an optimal solutionrepresented by basic and non-basic sets B and N; its solution xðtÞ can be
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 477
decomposed as ðxNðtÞ; xBðtÞÞ where xNðtÞ converges to 0, and xBðtÞ converges
to A1B b:
In the following we consider the non-basic set N ¼ f1;y; n mg; and fornotational convenience denote the m m matrix AB by B and denote AN by N; i.e.A ¼ ðN;BÞ:The Faybusovich vector field is a projection of the gradient of the linear cost
function onto the constraint set, relative to a Riemannian metric which enforces the
positivity constraints xX0 [14]. Let hðxÞ ¼ cT x: We denote this projection by grad h:The explicit form of the gradient is:
grad hðxÞ ¼ ½X XATðAXATÞ1AX �c; ð4Þ
where X is the diagonal matrix Diagðx1yxnÞ: It is clear from (4) that
A grad hðxÞ ¼ 0:
Thus, the dynamics
dx
dt¼ grad hðxÞ ð5Þ
preserves the constraint Ax ¼ b in (2). Thus, the faces of the polyhedron areinvariant sets of the dynamics induced by grad h: Furthermore, it is shown in [14]that the fixed points of grad h coincide with the vertices of the polyhedron, and thatthe dynamics converges exponentially to the maximal vertex of the LP problem.Since the formal solution of the Faybusovich vector field is the basis of our analysiswe give its derivation in Appendix A.Solving (5) requires an appropriate initial condition—an interior point in this case.
This can be addressed either by using the ‘‘big-M’’ method [26], which has essentiallythe same convergence rate, or by solving an auxiliary linear programming problem[34]. We stress that here, the initial interior point is not an input for the computation,but rather a part of the algorithm. In the analog implementation the initial pointshould be found by the same device used to solve the LP problem.The linear programming problem (2) has n m independent variables. The formal
solution shown below, describes the time evolution of the n m variables xNðtÞ; interms of the variables xBðtÞ:WhenN is the non-basic set of an optimal vertex of the
LP problem, xNðtÞ converges to 0, and xBðtÞ converges to A1B b: Denote by e1;y; en
the standard basis of Rn; and define the n m vectors
mi ¼ ei þXm
j¼1ajie
j; ð6Þ
where
aji ¼ ðB1NÞji ð7Þ
is an m ðn mÞ matrix. The vectors mi are perpendicular to the rows of A and areparallel to the faces of the polyhedron defined by the constraints. In this notation the
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510478
analytical solution is (see Appendix A):
xiðtÞ ¼ xið0Þ exp Dit Xm
j¼1aji log
xjþnmðtÞxjþnmð0Þ
!; iAN ¼ f1;y; n mg;
ð8Þ
where xið0Þ and xjþnmð0Þ are components of the initial condition, xjþnmðtÞ are thexB components of the solution, and
Di ¼ /mi; cS ¼ ci Xm
j¼1cjaji ð9Þ
(where /:; :S is the Euclidean inner product).An important property which relates the signs of the Di and the optimality of the
partition of A (into ðB;NÞ) relative to which they were computed is now stated:
Lemma 2.1 (Faybusovich [14]). For a polyhedron with fn m þ 1;y; ng; a basic set
of a maximum vertex,
DiX0 i ¼ 1;y; n m:
The converse statement does not necessarily hold. The Di are independent of b:Thus we may have that all Di are positive, and yet the constraint set is empty.
Remark 2.1. Note that the analytical solution is only a formal one, and does notprovide an answer to the LP instance, since the Di depend on the partition of A; andonly relative to a partition corresponding to a maximum vertex are all the Di
positive.
The quantities Di are the convergence rates of the Faybusovich flow, and thusmeasure the time required to reach the e-vicinity of the optimal vertex, where e isarbitrarily small:
TeBjlog ejDmin
; ð10Þ
where
Dmin ¼ mini
Di: ð11Þ
Therefore, if the optimal vertex is required with arbitrary precision e; then the
computation time (or complexity) is OðD1minjlog ejÞ:
In summary, if the Di are small then large computation times will be required. TheDi can be arbitrarily small when the inputs are real numbers, resulting in anunbounded computation time. However, we will show that in the probabilistic
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 479
model, which we define in the next section ‘‘bad’’ instances are rare, and the flowperforms well ‘‘with high probability’’ (see Theorem 4.1 and Corollary 5.1).
3. The probabilistic model
We now define the ensemble of LP problems for which we analyze the complexity
of the Faybusovich flow. Denote by Nð0; s2Þ the standard Gaussian distribution
with 0 mean and variance s2: Consider an ensemble in which the components ofðA; b; cÞ are i.i.d. (independent identically distributed) random variables with the
distribution Nð0; s2Þ: The model will consist of the following set of problems:
LPM ¼fðA; b; cÞ j ðA; b; cÞ are i:i:d: variables with the distribution Nð0; s2Þ
and the LP problem has a bounded optimal solutiong: ð12Þ
Therefore, we use matrices with a distribution Nð0; s2Þ:
f ðAÞ ¼ 1
ZA
exp 1
2s2tr AT A
� �ð13Þ
with normalization
ZA ¼Z
dmnA exp 1
2s2trAT A
� �¼ ð2ps2Þmn=2: ð14Þ
Ensemble (13) factorizes into mn i.i.d. Gaussian random variables for each of thecomponents of A:The distributions of the vectors c and b are defined by
f ðcÞ ¼ 1
Zc
exp 1
2s2cT c
� �ð15Þ
with normalization
Zc ¼Z
dnc exp 1
2s2cT c
� �¼ ð2ps2Þn=2; ð16Þ
and
f ðbÞ ¼ 1
Zb
exp 1
2s2bT b
� �ð17Þ
with normalization
Zb ¼Z
dmb exp 1
2s2bT b
� �¼ ð2ps2Þm=2: ð18Þ
With the introduction of a probabilistic model of LP instances Dmin becomes arandom variable. We wish to compute the probability distribution of Dmin forinstances with a bounded solution, when Dmin40: We reduce this problem to the
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510480
simpler task of computing PðDmin4DjDmin40Þ; in which the condition Dmin40 ismuch easier to impose than the condition that an instance produces an LP problemwith a bounded solution. This reduction is justified by the following lemma:
Lemma 3.1.
PðDmin4D j LP instance has a bounded maximum vertexÞ
¼ PðDmin4DjDmin40Þ: ð19Þ
Proof. Let ðA; b; cÞ be an LP instance chosen according to the probabilitydistributions (13), (15) and (17). There is a unique orthant (out of the 2n orthants)where the constraint set Ax ¼ b defines a nonempty polyhedron. This orthant is notnecessarily the positive orthant, as in the standard formulation of LP.Let us consider now any vertex of this polyhedron with basic and non-basic sets B
and N: Its m non-vanishing coordinates xB are given by solving ABxB ¼ b: Thematrix AB is full rank with probability 1; also, the components of xB are non-zeroand finite with probability 1. Therefore, in the probabilistic analysis we can assumethat xB is well defined and non-zero. With this vertex we associate the n m
quantities Di ¼ ðcNÞi þ ðcTBA1
B ANÞi; from (9).
We now show that there is a set of 2m equiprobable instances, which contains theinstance ðA; b; cÞ; that shares the same vector b and the same values of fDig; whencomputed according to the given partition. This set contains a unique instance withxB in the positive orthant. Thus, if Dmin40; the latter instance will be the uniquemember of the set which has a bounded optimal solution.To this end, consider the set RðxBÞ of the 2m reflections QlxB of xB; where Ql is an
m m diagonal matrix with diagonal entries 71 and l ¼ 1; 2;y; 2m:Given the instance ðA; b; cÞ and a particular partition into basic and non-basic sets,
we split A columnwise into ðAB;ANÞ and c into ðcB; cNÞ: Let S be the set of 2m
instances ððABQl ;ANÞ; b; ðQlcB; cNÞÞ where l ¼ 1;y; 2m: The vertices QlxB of theseinstances, which correspond to the prescribed partition, comprise the set RðxBÞ;since ðABQlÞðQlxBÞ ¼ b: Furthermore, all elements in RðxBÞ (each of which
corresponds to a different instance) have the same set of D’s, since Di ¼ ðcNÞi þ½ðQlcBÞT ðABQlÞ1AN�i: Because of the symmetry of the ensemble under the
reflections Ql ; the probability of all instances in S is the same.All the vertices belonging to RðxBÞ have the same Di’s with the same probability,
and exactly one is in the positive orthant. Thus, if Dmin40; the latter vertex is theunique element from S which is the optimal vertex of an LP problem with abounded solution. Consequently, the probability of having any prescribed set of Di’s,and in particular, the probability distribution for the Di’s given Dmin40; is notaffected by the event that the LP instance has a bounded optimal solution (i.e., thatthe vertex is in the positive orthant). In other words, these are independent events.Integration over all instances and taking this way into account all possible sets Swhile imposing the requirement fDmin4DjDmin40g results in (19). &
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 481
The event Dmin40 corresponds to a specific partition of A into basic and non-basicsets B;N; respectively. It turns out that it is much easier to analytically calculate theprobability distribution of Dmin for a given partition of the matrix A: It will be shownin what follows that in the probabilistic model we defined, PðDmin4DjDmin40Þ isproportional to the probability that Dmin4D for a fixed partition. Let Wj be the
event that a partition j of the matrix A is an optimal partition, i.e. all Di are positive(j is an index with range 1;y; ðn
mÞ). Let the index 1 stand for the partition where B is
taken from the last m columns of A: We now show:
Lemma 3.2. Let D40 then
PðDmin4DjDmin40Þ ¼ PðDmin4DjW1Þ:
Proof. Given that Dmin40; there is a unique optimal partition since a non-uniqueoptimal partition occurs only if c is orthogonal to some face of the polyhedron, inwhich case Di ¼ 0 for some i: Thus we can write:
PðDmin4DjDmin40Þ ¼X
j
PðDmin4DjDmin40;WjÞPðWjÞ ð20Þ
¼X
j
PðDmin4DjWjÞPðWjÞ; ð21Þ
where the second equality holds since the event Wj is contained in the event that
Dmin40: The probability distribution of ðA; cÞ is invariant under permutations ofcolumns of A and c; and under permutations of rows of A: Therefore theprobabilities PðWjÞ are all equal, and so are PðDmin4DjDmin40;WjÞ; and the resultfollows. &
We define
Dmin 1 ¼ min fDi jDi are computed relative to the partition 1g: ð22Þ
Note that the definition of Dmin in Eq. (11) is relative to the optimal partition. Toshow that all computations can be carried out for a fixed partition of A we need thenext lemma:
Lemma 3.3. Let D40 then
PðDmin4DjDmin40Þ ¼ PðDmin 14DÞPðDmin 140Þ :
Proof. The result follows from
PðDmin4DjW1Þ ¼ PðDmin 14DjDmin 140Þ; ð23Þ
combined with the result of the previous lemma and the definition of conditionalprobability. &
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510482
In view of the symmetry of the joint probability distribution (j.p.d.) ofD1;y;Dnm; given by (28) and (32), the normalization constant PðDmin 140Þsatisfies:
PðDmin 140Þ ¼ 1=2nm: ð24Þ
Remark 3.1. Note that we are assuming throughout this work, that the optimalvertex is unique, i.e., given a partition ðN;BÞ of A that corresponds to an optimalvertex, the basic components are all non-zero. The reason is that if one of thecomponents of the optimal vertex vanishes, all of its permutations with the n m
components of the non-basic set result in the same value of cT x: Vanishing of one ofthe components of the optimal vertex requires that b is a linear combination ofcolumns of A; that is an event of zero measure in our probabilistic ensemble.Therefore, this case will not be considered in the present work.
4. Computing the distributions of Dmin 1 and of Dmin
In the following we compute first the distribution of Dmin 1 and use it to obtain thedistribution of Dmin via Lemma 3.3. We denote the first n m components of c by y;and its last m components by z: In this notation equation (9) for Di takes the form:
Dp ¼ yp þ ðzT B1NÞp; p ¼ 1;y; n m: ð25Þ
Our notation will be such that indices
i; j; k;y range over 1; 2;y;m
and
p; q;y range over 1; 2;y; n m:
In this notation, ensembles (13) and (15) may be written as
f ðAÞ ¼ f ðN;BÞ ¼ 1
ZA
exp 1
2s2X
ij
B2ij þ
Xip
N2ip
!" #;
f ðcÞ ¼ f ðy; zÞ ¼ 1
Zc
exp 1
2s2X
i
z2i þX
p
y2p
!" #: ð26Þ
We first compute the joint probability distribution (j.p.d.) of D1;y;Dnm relativeto the partition 1. This is denoted by f1ðD1;y;DnmÞ: Using (25), we write
f1ðD1;y;DnmÞ ¼Z
dm2
B dmðnmÞN dmz dnmy
f ðN;BÞf ðy; zÞYnm
q¼1d Dq þ yq
Xm
i;j¼1zjðB1ÞjiNiq
!; ð27Þ
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 483
where dðxÞ is the Dirac delta function. We note that this j.p.d. is not only completelysymmetric under permuting the Dp’s, but is also independent of the partition relative
to which it is computed.We would like now to perform the integrals in (27) and obtain a more explicit
expression for f1ðD1;y;DnmÞ: It turns out that direct integration over the yq’s,
using the d function, is not the most efficient way to proceed. Instead, we representeach of the d functions as a Fourier integral. Thus,
f1ðD1;y;DnmÞ ¼Z
dm2
B dmðnmÞN dmz dnmydnmlð2pÞnm f ðN;BÞf ðy; zÞ
exp iX
q
lq Dq þ yq Xm
i;j¼1zjðB1ÞjiNiq
!" #:
Integration over Nip; lq and yp is straightforward and yields
f1ðD1;y;DnmÞ ¼1
2ps2
� �m2þn2Z
dm2
B dmz
½zTðBT BÞ1z þ 1�nm2
exp 1
2s2X
ij
B2ij þ
Xi
z2i þP
p D2p
zT ðBT BÞ1z þ 1
!" #:
ð28Þ
Here the complete symmetry of f1ðD1;y;DnmÞ under permutations of the Dp’s is
explicit, since it is a function ofP
p D2p:
The integrand in (28) contains the combination
uðB; zÞ ¼ 1
zTðBT BÞ1z þ 1: ð29Þ
Obviously, 0puðB; zÞp1: It will turn out to be very useful to consider thedistribution function PðuÞ of the random variable u ¼ uðB; zÞ; namely,
PðuÞ ¼ 1
2ps2
� �m2þm2Z
dm2
B dmz e12s2ðtr BT BþzT zÞ � d u 1
zTðBT BÞ1z þ 1
!:
ð30Þ
Note from (29) that uðlB; lzÞ ¼ uðB; zÞ: Thus, in fact, PðuÞ is independent ofthe (common) variance s of the Gaussian variables B and z; and we might as wellrewrite (30) as
PðuÞ ¼ lp
� �m2þm2Z
dm2
B dmz elðtr BT BþzT zÞ � d u 1
zTðBT BÞ1z þ 1
!; ð31Þ
with l40 an arbitrary parameter.
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510484
Thus, if we could calculate PðuÞ explicitly, we would be able to express the j.p.d.f1ðD1;y;DnmÞ in (28) in terms of the one-dimensional integral
f1ðD1;y;DnmÞ ¼1
2ps2
� �nm2Z
N
0
du PðuÞunm2 exp 1
2s2uXnm
p¼1D2
p
!" #;
ð32Þ
as can be seen by comparing (28) and (30).In this paper we are interested mainly in the minimal D: Thus, we need fmin 1ðDÞ;
the probability density of Dmin 1: Due to the symmetry of f1ðD1;y;DnmÞ; which isexplicit in (32), we can express fmin 1ðDÞ simply as
fmin 1ðDÞ ¼ ðn mÞZ
N
DdD2 ydDnm f1ðD;D2;y;DnmÞ: ð33Þ
It will be more convenient to consider the complementary cumulative distribution(c.c.d.)
QðDÞ ¼ PðDmin 14DÞ ¼Z
N
Dfmin 1ðuÞ du; ð34Þ
in terms of which
fmin 1ðDÞ ¼ @
@DQðDÞ: ð35Þ
The c.c.d. QðDÞ may be expressed as a symmetric integral
QðDÞ ¼Z
N
DdD1 y dDnm f1ðD1;D2;y;DnmÞ ð36Þ
over the D’s, and thus, it is computationally a more convenient object to considerthan fmin 1ðDÞ:From (36) and (32) we obtain that
QðDÞ ¼ 1
2ps2
� �nm2Z
N
0
du PðuÞffiffiffiu
p ZN
Ddv e
12s2 uv2
� �nm
; ð37Þ
and from (37) one readily finds that
Qð0Þ ¼ 1
2nm; ð38Þ
(as well as QðNÞ ¼ 1; by definition of Q).Then, use of the integral representation
1 erfðxÞ ¼ erfcðxÞ ¼ 2ffiffiffip
pZ
N
x
dv ev2 ðx40Þ; ð39Þ
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 485
and (38) leads (for D40) to
QðDÞ ¼ Qð0ÞZ
N
0
du PðuÞ erfc D
ffiffiffiffiffiffiffiu
2s2
r �� �nm
: ð40Þ
This expression is an exact integral representation of QðDÞ (in terms of the yetundetermined probability distribution PðuÞ).In order to proceed, we have to determine PðuÞ: Determining PðuÞ for any pair of
integers ðn;mÞ in (31) in a closed form is a difficult task. However, since we areinterested mainly in the asymptotic behavior of computation times, we will contendourselves in analyzing the behavior of PðuÞ as n;m-N; with
r � m=no1 ð41Þheld fixed.We were able to determine the large n;m behavior of PðuÞ (and thus of
f1ðD1;D2;y;DnmÞ and QðDÞ) using standard methods [9] (Some papers that treatrandom real rectangular matrices, such as the matrices relevant for this work, see [3].For earlier work see G.M. Cicuta et al.) of random matrix theory [22].This calculation is presented in detail in Appendix B. We show there (see
Eq. (B.26)) that the leading asymptotic behavior of PðuÞ is
PðuÞ ¼ffiffiffiffiffiffiffiffim
2pu
re
mu2 ; ð42Þ
namely,ffiffiffiu
pis simply a Gaussian variable, with variance proportional to 1=
ffiffiffiffim
p:
Note that (42) is independent of the width s; which is consistent with the remarkpreceding (31).Substituting (42) in (32), we obtain, with the help of the integral representation
GðzÞ ¼Z
N
0
tz1et dt ð43Þ
of the G function, the large n;m behavior of the j.p.d. f1ðD1;y;DnmÞ as
f1ðD1;y;DnmÞ ¼ffiffiffiffim
psG
n m þ 1
2
� �1
p1
ms2 þP
p D2p
!nmþ12
: ð44Þ
Thus, the D’s follow asymptotically a multi-dimensional Cauchy distribution. It canbe checked that (44) is properly normalized to 1.
Similarly, by substituting (42) in (40), and changing the variable to y ¼ffiffiffiffiffiffiffiffiffiffiffimu=2
p;
we obtain the large n;m behavior of QðDÞ as
QðDÞ ¼ 2Qð0Þffiffiffip
pZ
N
0
dy ey2 erfc Dyffiffiffiffim
ps
�� �nm
: ð45Þ
As a consistency check of our large n;m asymptotic expressions, we have verified,with the help of (43), that substituting (44) into (36) leads to (40), with PðuÞ theregiven by the asymptotic expression (42).We are interested in the scaling behavior of QðDÞ in (45) in the limit n;m-N: In
this large n;m limit, the factor ðerfc½D yffiffiffim
ps�Þ
nm in (45) decays rapidly to zero. Thus,
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510486
the integral in (45) will be appreciably different from zero only in a small regionaround D ¼ 0; where the erfc function is very close to 1. More precisely, using
erfc x ¼ 1 2xffiffip
p þ Oðx2Þ; we may expand the erfc term in (45) as
erfc Dyffiffiffiffim
ps
�� �nm
¼ 1 2yDffiffiffiffiffiffiffiffiffiffiffipms2
p þ?� �nm
ð46Þ
(due to the Gaussian damping factor in (45), this expansion is uniform in y). Thus,we see that QðDÞ=Qð0Þ will be appreciably different from zero only for values of D=sof the order up to 1=
ffiffiffiffim
p; for which (46) exponentiates into a quantity of Oð1Þ; and
thus
QðDÞC2Qð0Þffiffiffip
pZ
N
0
dy ey2 exp 2ffiffiffip
p n
m 1
� �yd
� �; ð47Þ
where
d ¼ffiffiffiffim
pD
sð48Þ
is Oðm0Þ: Note that m=n is kept finite and fixed. The integral in (47) can be done, andthus we arrive at the explicit scaling behavior of the c.c.d.
QðDÞ ¼ Qð0Þ ex2D erfcðxDÞ; ð49Þ
where
xD ¼ ZDðn;mÞD; ð50Þ
with
ZDðn;mÞ ¼ 1ffiffiffip
p n
m 1
� � ffiffiffiffim
p
s: ð51Þ
The c.c.d. QðDÞ depends, in principle, on all the three variables n;m and D: The result(49) demonstrates, that in the limit ðn;mÞ-N (with r ¼ m=n held finite and fixed),QðDÞ is a function only of one scaling variable: the xD defined in (50).We have compared (49) and (50) against results of numerical simulations, for
various values of n=m: The results are shown in Figs. 2 and 3 in Section 8.Establishing the explicit scaling expression of the probability distribution of the
convergence rate constitutes the main result in our paper, which we summarize bythe following Theorem:
Theorem 4.1. Assume that LP problems of the form (2), with the instances distributed
according to (13)–(18), are solved by the Faybusovich algorithm (5). Then, in the
asymptotic limit n-N; m-N with 0or ¼ m=no1 kept fixed, the convergence rate
Dmin defined by (11) is distributed according to
PðDmin4Djbounded optimal solutionÞ ¼ ex2D erfcðxDÞ; ð52Þ
where xD is given by (50).
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 487
Proof. QðDÞ ¼ PðDmin 14DÞ by (34). Therefore, use of (24) and (38), namely,
PðDmin 140Þ ¼ Qð0Þ ¼ 1=2nm
and of (49) implies
PðDmin 14DÞ ¼ 1
2nmex2D erfcðxDÞ; ð53Þ
but according to Lemma 3.3,
PðDmin4DjDmin40Þ ¼ PðDmin 14DÞPðDmin 140Þ :
Finally, substituting (53) and (24) in the last equation, and use of Lemma 3.1, leadsto the statement of the theorem. &
From (49) and (50), we can obtain the probability density fmin 1ðDÞ of Dmin 1; using(35). In particular, we find
fmin 1ð0Þ ¼2ffiffiffiffim
p
psn
m 1
� �Qð0Þ; ð54Þ
which coincides with the expression one obtains for fmin 1ð0Þ by directly substitutingthe large ðn;mÞ expression (45) into (35), without first going to the scaling regime
DB1=ffiffiffiffim
p; where (49) holds.
5. High-probability behavior
In this paper we show that the Faybusovich vector field performs well with highprobability, a term that is explained in what follows. Such an analysis was carriedout for interior point methods e.g. in [23,34]. When the inputs of an algorithm have aprobability distribution, Dmin becomes a random variable. High probability behavioris defined as follows:
Definition 5.1. Let Tn be a random variable associated with problems of size n: Wesay that TðnÞ is a high probability bound on Tn if for n-N TnpTðnÞ withprobability one.
To show that 1=DminoZðmÞ with high probability is the same as showing that
Dmin41=ZðmÞ with high probability. Let fðmÞmin ðDjDmin40Þ denote the probability
density of Dmin given Dmin40: The m superscript is a mnemonic for its dependenceon the problem size. We make the following observation:
Lemma 5.1. Let PðDmin4xjDmin40Þ be analytic in x around x ¼ 0: Then,
Dmin4½f ðmÞmin ð0jDmin40ÞgðmÞ�1 with high-probability, where gðmÞ is any function such
that limm-N gðmÞ ¼ N:
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510488
Proof. For very small x we have:
PðDmin4xjDmin40ÞE1 fðmÞmin ð0jDmin40Þx: ð55Þ
We look for x ¼ xðmÞ such that PðDmin4xðmÞjDmin40Þ ¼ 1 with high probability.For this it is sufficient that
limm-N
fðmÞmin ð0jDmin40ÞxðmÞ ¼ 0: ð56Þ
This holds if
xðmÞ ¼ ½f ðmÞmin ð0jDmin40ÞgðmÞ�1; ð57Þ
where gðmÞ is any function such that limm-N gðmÞ ¼ N: &
The growth of gðmÞ can be arbitrarily slow, so from this point on we will ignorethis factor.As a corollary to Theorem 4.1 and (54) we now obtain:
Corollary 5.1. Let ðA; b; cÞ be linear programming instances distributed according to
(12) then
1
Dmin¼ Oðm1=2Þ and Te ¼ Oðm1=2Þ ð58Þ
with high probability.
Proof. According to the results of Section 4, (and more explicitly, from the
derivation of (86) in Section 7), fðmÞmin ð0jDmin40ÞBm1=2; and the result follows from
Lemma 5.1 and the definition of Te (Eq. (10)). &
Remark 5.1. Note that bounds obtained in this method are tight, since they arebased on the actual distribution of the data.
Remark 5.2. Note that fðmÞmin ð0jDmin40Þa0: Therefore, the 1
D moment of the
probability density function fðmÞmin ðDjDmin40Þ does not exist.
6. Measures of complexity in the non-asymptotic regime
In some situations one wants to identify the optimal vertex with limited precision.The term
biðtÞ ¼ Xm
j¼1aji log
xjþnmðtÞxjþnmð0Þ
ð59Þ
in (8), when it is positive, is a kind of ‘‘barrier’’: Dit in Eq. (8) must be larger than thebarrier before xi can decrease to zero.
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 489
In this section we discuss heuristically the behavior of the barrier biðtÞ as thedynamical system flows to the optimal vertex. To this end, we first discuss in thefollowing sub-section some relevant probabilistic properties of the vertices ofpolyhedrons in our ensemble.
6.1. The typical magnitude of the coordinates of vertices
Flow (5) conserves the constraint Ax ¼ b in (2). Let us split these equa-tions according to the basic and non-basic sets which corresponding to an arbitrary
vertex as
ABxB þ ANxN ¼ b: ð60Þ
Precisely at the vertex in question xN ¼ 0; of course. However, we may be interestedin the vicinity of that vertex, and thus leave xN arbitrary at this point.We may consider (60) as a system of equations in the unknowns xB with
parameters xN; with coefficients AB;AN and b drawn from the equivariant Gaussianensembles (13), (14), (17) and (18). Thus, the components of xB (e.g., the xjþnmðtÞ’sin (59) if we are considering the optimal vertex) are random variables. The jointprobability density for the m random variables xB is given by Theorem 4.2 of [15](applied to the particular Gaussian ensembles (13), (14), (17) and (18)) as
PðxB; xNÞ ¼Gðmþ1
2Þ
pmþ12
l
ðl2 þ xTBxBÞ
mþ12
; ð61Þ
where
l ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1þ xT
NxN
q: ð62Þ
(Strictly speaking, we should constrain xB to lie in the positive orthant, and thusmultiply (61) by a factor 2m to keep it normalized. However, since these details donot affect our discussion below, we avoid introducing them.)It follows from (61) that the components of xB are identically distributed, with
probability density of any one of the components xBj ¼ z given by
pðz; xNÞ ¼ 1
pl
l2 þ z2ð63Þ
in accordance with a general theorem due to Girko [16].The main object of the discussion in this sub-section is to estimate the typical
magnitude of the m components of xB: One could argue that typically all m
components jxBjjol; since the Cauchy distribution (63) has width l: However, from(63) we have that Probðjzj4lÞ ¼ 1=2; namely, jxBjjol and jxBjj4l occur with equalprobability. Thus, one has to be more careful, and the answer lies in the probability
density function for R ¼ffiffiffiffiffiffiffiffiffiffiffiffixTBxB
p:
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510490
From (61), we find that the probability density function for R ¼ffiffiffiffiffiffiffiffiffiffiffiffixTBxB
ptakes the
form
PðjxBj ¼ RÞ ¼ 2ffiffiffip
p Gðmþ12Þ
Gðm2Þ1
lðRlÞ
m1
½1þ ðRlÞ
2�mþ12
: ð64Þ
For a finite fixed value of m; this expression vanishes as ðR=lÞm1 for R5l; attainsits maximum at
R
l
� �2
¼ m 1
2; ð65Þ
and then and decays like l=R2 for Rbl: Thus, like the even Cauchy distribution(63), it does not have a second moment.In order to make (64) more transparent, we introduce the angle y defined by
tan yðRÞ ¼ R
l; ð66Þ
where 0pypp=2: In terms of y we have
PðjxBj ¼ RÞ ¼ 2ffiffiffip
p Gðmþ12Þ
Gðm2Þ1
lcos2 y sinm1 y: ð67Þ
(In order to obtain the probability density for y we have to multiply the latter
expression by a factor dR=dy ¼ l=cos2 y:)Let us now concentrate on the asymptotic behavior of (67) (or (64)) in the limit
m-N: Using Stirling’s formula
GðxÞBffiffiffiffiffiffi2px
rxx ex ð68Þ
for the large x asymptotic behavior of the Gamma functions, we obtain for m-N
PðjxBj ¼ RÞBffiffiffiffiffiffiffi2m
pl2
rcos2 y sinm1 y: ð69Þ
Clearly, (69) is exponentially small in m; unless sin yC1; which implies
y ¼ p=2 d ð70Þ
with dB1=ffiffiffiffim
p: Thus, writing
d ¼ffiffiffiffiffi2u
m
rð71Þ
(with u5m), we obtain, for m-N;
PðjxBj ¼ RÞBffiffiffiffiffiffiffiffiffiffiffi8
pml2
rueu: ð72Þ
In this regime
R
l¼ tan yC
ffiffiffiffiffim
2u
rb1: ð73Þ
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 491
The function on the r.h.s. of (72) has its maximum at u ¼ 1; i.e., at R=l ¼ffiffiffiffiffiffiffiffiffim=2
p(in
accordance with (65)) and has width of Oð1Þ around that maximum. However, this isnot enough to deduce the typical behavior of R=l; since as we have already
commented following (65), PðjxBj ¼ RÞ has long tails and decays like l=R2 past itsmaximum. Thus, we have to calculate the probability that R4R0 ¼ l tan y0; givenR0: The calculation is straight forward: using (69) and (66) we obtain
ProbðR4R0Þ ¼Z
N
R0
PðRÞ dR ¼ffiffiffiffiffiffiffi2m
p
r Z p2
y0sinm1 y: ð74Þ
Due to the fact that in the limit m-N; sinm1ymay be approximated by a Gaussiancentered around y ¼ p=2 with variance 1=m; it is clear that
ProbðR4R0Þ ¼ Probðy4y0ÞC1;
unless d0 ¼ p=2 y0Bffiffiffiffiffiffiffiffiffiffiffiffiffi2u0=m
p; with u05m: Thus, using (70) and (71) we obtain
ProbðR4R0Þ ¼ffiffiffiffiffiffiffi2m
p
r Z d0
0
cosm1 dB1ffiffiffip
pZ u0
0
eu duffiffiffiu
p ¼ erfð ffiffiffiffiffiu0
p Þ: ð75Þ
Finally, using the definitions of u0; y0 and R0; we rewrite (75) as
ProbðR4R0Þ ¼ erf
ffiffiffiffim
2
rarctan
lR0
� � �: ð76Þ
From the asymptotic behavior erfðxÞB1 ex2=xffiffiffip
pat large x; we see that
ProbðR4R0Þ saturates at 1 exponentially fast as R0 decreases. Consequently,
1 ProbðR4R0ÞBOðm0Þ is not negligible only if R0=l is large enough, namely,ffiffiffim2
parctanð l
R0Þp1; i.e., R0=lX
ffiffiffiffiffiffiffiffiffim=2
p: If R0=l is very large, namely, R0=lb
ffiffiffiffiffiffiffiffiffim=2
p;
which corresponds to a small argument of the error function in (76), where we clearly
have ProbðR4R0ÞCffiffiffiffiffiffiffiffiffiffiffiffi2m=p
pðl=R0Þ51: From these properties of (76) it thus follows
that typically
R
lBOð
ffiffiffiffim
pÞ: ð77Þ
Up to this point, we have left the parameters xN unspecified. At this point weselect the prescribed vertex of the polyhedron. At the vertex itself, xN ¼ 0:Therefore, from (62), we see that l ¼ 1: Thus, according to (77), at the vertex,typically
RvertexBOðffiffiffiffim
pÞ: ð78Þ
This result obviously holds for any vertex of the polyhedron: any partition (60) of thesystem of equations Ax ¼ b into basic and non-basic sets leads to the samedistribution function (61), and at each vertex we have xN ¼ 0:Thus, clearly, this means that the whole polyhedron is typically bounded inside an
n-dimensional sphere of radius RBOðffiffiffiffim
pÞ centered at the origin.
Thus, from (78) and from the rotational symmetry of (61), we conclude that anycomponent of xB at the optimal vertex, or at any other vertex (with its appropriate
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510492
basic set B), is typically of OðRvertex=ffiffiffiffim
pÞ ¼ Oð1Þ (and of course, positive). Points on
the polyhedron other than the vertices are weighted linear combinations of thevertices with positive weights which are smaller than unity, and as such also havetheir individual components typically of Oð1Þ:
6.2. Non-asymptotic complexity measures from bi
Applying the results of the previous subsection to the optimal vertex, we expect thecomponents of xBðtÞ (i.e., the xjþnmðtÞ’s in (59)) to be typically of the same order ofmagnitude as their asymptotic values limt-N xBðtÞ at the optimal vertex, and as aresult, we expect the barrier biðtÞ to be of the same order of magnitude as itsasymptotic value limt-N biðtÞ:Note that, for this reason, in order to determine how the xiðtÞ in (8) tend to zero,
to leading order, we can safely replace all the xjþnmðtÞ by their asymptotic values inx�B: Thus, in the following we approximate the barrier (59) by its asymptotic value
bi ¼ limt-N
Xm
j¼1aji log xjþnmðtÞ ¼
Xm
j¼1aji log x�
jþnm; ð79Þ
where we have also ignored the contribution of the initial condition.We now consider the convergence time of the solution xðtÞ of (5) to the optimal
vertex. In order for xðtÞ to be close to the maximum vertex we must have xiðtÞoe fori ¼ 1;y; n m for some small positive e: The time parameter t must then satisfy:
expðDit þ biÞoe for i ¼ 1;y; n m: ð80Þ
Solving for t; we find an estimate for the time required to flow to the vicinity of theoptimal vertex as
t4bi
Di
þ jlog ejDi
for all i ¼ 1;y;m: ð81Þ
We define
T ¼ maxi
bi
Di
þ jlog ejDi
� �; ð82Þ
which we consider as the computation time. We denote
bmax ¼ maxi
bi: ð83Þ
In the limit of asymptotically small e; the first term in (82) is irrelevant, and thedistribution of computation times is determined by the distribution of the Di’s statedby Theorem 4.1.If the asymptotic precision is not required, the first term in (82) may be dominant.
To bound this term in the expression for the computation time we can use thequotient bmax=Dmin; where Dmin is defined in (11).In the probabilistic ensemble used in this work bmax and bmax=Dmin are random
variables, as is Dmin:Unfortunately, we could not find the probability distributions ofbmax and bmax=Dmin analytically as we did for Dmin: In the following section, a
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 493
conjecture concerning these distributions, based on numerical evidence, will beformulated.
7. Scaling functions
In Section 4 it was shown that in the limit of large ðn;mÞ the probability
PðDmin4DjDmin40Þ is given by (52). Consequently, PðDminoDjDmin40Þ �Fðn;mÞðDÞ is of the scaling form
Fðn;mÞðDÞ ¼ 1 ex2D erfcðxDÞ � FðxDÞ: ð84Þ
Such a scaling form is very useful and informative, as we will demonstrate in whatfollows. The scaling function F contains all asymptotic information on D: In
particular, one can extract the problem size dependence of fðmÞmin ð0jDmin40Þ which is
required for obtaining a high probability bound using Lemma 5.1. (This has alreadybeen shown in Corollary 5.1.) We use the scaling form, Eq. (84), leading to,
fðmÞmin ð0jDmin40Þ ¼ dFðn;mÞðDÞ
dD
�����D¼0
¼ ZDðn;mÞFðxDÞdxD
����xD¼0
: ð85Þ
This is just fmin 1ð0Þ=Qð0Þ: With the help of Lemma 5.1, leading to (58) and our
finding that Zðn;mÞBffiffiffiffim
p; we conclude that with high probability
1
Dmin¼ Oð
ffiffiffiffim
pÞ: ð86Þ
The next observation is that the distribution FðxDÞ is very wide. For large xD it
behaves as 1 1ffiffip
pxD; as is clear from the asymptotic behavior of the erfc function.
Therefore it does not have a mean. Since at xD ¼ 0 the slope dF=dxDjxD¼0 does not
vanish, also 1=xD does not have a mean (see Remark 5.2).We would like to derive scaling functions like (84) also for the barrier bmax; that is
the maximum of the bi defined by (79) and for the computation time T defined by(82). The analytic derivation of such scaling functions is difficult and therefore leftfor further studies. Their existence is verified numerically in the next section. Inparticular for fixed r ¼ m=n; we found that
P1
bmaxo1
b
� �� F
ðn;mÞ1=bmax
1
b
� �¼ F1=bðxbÞ ð87Þ
and
P1
To1
t
� �� F
ðn;mÞ1=T
1
t
� �¼ F1=TðxTÞ; ð88Þ
where bmax and T are the maximal barrier and computation time. The scalingvariables are
xb ¼ Zbðn;mÞ 1b
ð89Þ
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510494
and
xT ¼ ZTðn;mÞ 1t: ð90Þ
The asymptotic behavior of the scaling variables was determined numerically to be
Zbðn;mÞBm ð91Þ
and
ZT ðn;mÞBm log m: ð92Þ
This was found for constant r: The precise r dependence could not be determinednumerically. The resulting high probability behavior for the barrier and computationtime is therefore:
bmax ¼ OðmÞ; T ¼ Oðm log mÞ: ð93Þ
Note that scaling functions, such as these, immediately provide the averagebehavior as well (if it exists).Here, in the calculation of the distribution of computation times it was assumed
that these are dominated by the barriers rather than by j log ej in (82). The results(87), (88) and (93) are conjectures supported by the numerical calculations of thenext section.
8. Numerical simulations
In this section the results of numerical simulations for the distributions of LPproblems are presented. For this purpose we generated full LP instances ðA; b; cÞwith the distribution (12). For each instance the LP problem was solved using thelinear programming solver of the IMSL C library. Only instances with a boundedoptimal solution were kept, and Dmin was computed relative to the optimal partitionand optimality was verified by checking that Dmin40: Using the sampled instances
we obtain an estimate of Fðn;mÞðDÞ ¼ PðDminoDjDmin40Þ; and of the correspond-ing cumulative distribution functions of the barrier bmax and the computationtime.As a consistency verification of the calculations we compared PðDmino
DjDmin40Þ; to PðDmin 1oDjDmin 140Þ that was directly estimated from thedistribution of matrices. For this purpose, we generated a sample of A and c
according to the probability distributions (13), (15) with s ¼ 1 and computed foreach instance the value of Dmin 1 (the minimum over Di) for a fixed partition of A intoðN;BÞ: We kept only the positive values (note that the definition of Dmin 1 does notrequire b). The two distributions are compared in Fig. 1, with excellent agreement.Note that estimation of PðDmin 1oDjDmin 140Þ by sampling from a fixed partition
is infeasible for large m and n; since for any partition of A the probability that Dmin 1
is positive is 2ðnmÞ (Eq. (24)). Therefore, the equivalence between the probabilitydistributions of Dmin and Dmin 1 cannot be exploited for producing numerical
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 495
estimates of the probability distribution of Dmin: Thus we proceed by generating fullLP instances, and solving the LP problem as described above.The problem size dependence was explored while keeping the ratio n=m fixed or
while keeping m fixed and varying n: In Fig. 2, we plot the numerical estimates of
Fðn;mÞðDÞ for varying problem sizes with n=m ¼ 2 and compare it with the analyticalresult, Eq. (84). The agreement with the analytical result improves as m is increased,since it is an asymptotic result. The simulations show that the asymptotic result holdswell even for m ¼ 20: As in the analytical result, in the large m limit we observe that
Fðn;mÞðDÞ is not a general function of n; m and D; but a scaling function of the form
Fðn;mÞðDÞ ¼ FðxDÞ as predicted theoretically in Section 7 (see (84) there). The
scaling variable xDðmÞ is given by (50). Indeed, Fig. 3 demonstrates that Fðn;mÞ hasthis form as predicted by Eq. (84) with the scaling variable xD:For the cumulative distribution functions of the barrier bmax and of the
computation time T we do not have analytical formulas. These distribution
functions are denoted by Fðn;mÞ1=bmax
and Fðn;mÞ1=T
; respectively. Their behavior near zero
enables to obtain high probability bounds on bmax and T ; since for this purpose weneed to bound the tails of their distributions, or alternatively, estimate the density of1=bmax and 1=T at 0. In the numerical estimate of the barrier we collected onlypositive values, since only these contribute to prolonging the computation time.
From Fig. 4 we find thatFðn;mÞ1=bmax
is indeed a scaling function of the form (87) with the
scaling variable xb of (89). The behavior of the computation time is extracted from
Fig. 5. The cumulative function Fðn;mÞ1=T
is found to be a scaling function of the form
(88) with the scaling variable xT of (90). The scaling variables xb and xT were found
ARTICLE IN PRESS
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
F(∆
)
∆
Fig. 1. Comparison of PðDmin 1oDjDmin 140Þ and PðDminoDjDmin40Þ for m ¼ 2; n ¼ 4:
A. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510496
numerically by the requirement that in the asymptotic limit the cumulativedistribution F approaches a scaling form. Such a fitting is possible only if a scalingform exists. We were unable to determine the dependence of the scaling variables xb
and xT on n=m:
ARTICLE IN PRESS
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
F(n
,m)
m=4
m=20
m=40m=80
m=120
F from simulationsF from analytical formula
∆
Fig. 2. Fðn;mÞðDÞ for m ¼ 4; 20; 40; 80; 120; n ¼ 2m: The number of instances was 105; 105; 40 000;
15 000; 5800; respectively. There is very good agreement with the analytical results, improving as m
increases.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
F
x∆
F from analytical formulam=4m=20m=40m=80m=120
Fig. 3. FðxDÞ is plotted against the variable xD; for the same data as Fig. 2.
A. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 497
9. Summary and discussion
In this paper we computed the problem size dependence of the distributions ofparameters that govern the convergence of a differential equation (Eq. (5)) thatsolves the linear programming problem [14]. To the best of our knowledge, this is thefirst time such distributions are computed. In particular, knowledge of the
ARTICLE IN PRESS
0 5 10 150
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
F1/
β
xβ
m=4
m=4m=20m=40m=80m=120
Fig. 4. F1=bðxbÞ as a function of the variable xb ¼ m=bmax for the same instances as Fig. 2.
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
F1/
T
xT
m=4
m=4m=20m=40m=80m=120
Fig. 5. F1=T ðxT Þ as a function of the variable xT ¼ m logm=T for the same instances as Fig. 2.
A. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510498
distribution functions enables to obtain the high probability behavior (for example(86) and (93)), and the moments (if these exist).The main result of the present work is that the distribution functions of the
convergence rate, Dmin; the barrier bmax and the computation time T are scalingfunctions; i.e., in the asymptotic limit of large ðn;mÞ; each depends on the problemsize only through a scaling variable. These functions are presented in Section 7.The scaling functions obtained here provide all the relevant information about the
distribution in the large ðn;mÞ limit. Such functions, even if known only numerically,can be useful for the understanding of the behavior for large values of ðn;mÞ that arebeyond the limits of numerical simulations. In particular, the distribution function ofDmin was calculated analytically and stated as Theorem 4.1. The relevance of theasymptotic theorem for finite and relatively small problem sizes ðn;mÞ wasdemonstrated numerically. It turns out to be a very simple function (see (84)). Thescaling form of the distributions of bmax and of T was conjectured on the basis ofnumerical simulations.The Faybusovich flow [6] that is studied in the present work, is defined by a
system of differential equations, and it can be considered as an example of theanalysis of convergence to fixed points for differential equations. One shouldnote, however, that the present system has a formal solution (8), and therefore it isnot typical.If we require knowledge of the attractive fixed points with arbitrarily high
precision (i.e., e of (80) and (82) can be made arbitrarily small), the convergence timeto an e-vicinity of the fixed point is dominated by the convergence rate Dmin: Thebarrier, that describes the state space ‘‘landscape’’ on the way to fixed points, isirrelevant in this case. Thus, in this limit, the complexity is determined by (86). Thispoint of view is taken in [28].However, for the solution of some problems (like the one studied in the present
work), such high precision is usually not required, and also the non-asymptoticbehavior (in e) of the vector field, as represented by the barrier, has an importantcontribution to the complexity of computing the fixed point.For computational models defined on the real numbers, worst case behavior can
be ill defined and lead to infinite computation times, in particular for interior pointmethods for linear programming [7]. Therefore, we compute the distribution ofcomputation times for a probabilistic model of linear programming instances ratherthan an upper bound. Such probabilistic models can be useful in giving a generalpicture also for traditional discrete problem solving, where the continuum theory canbe viewed as an approximation.A question of fundamental importance is how general is the existence of scaling
distributions. Their existence would be analogous to the central limit theorem [17]and to scaling in critical phenomena [32] and in Anderson localization [1,2].Typically, such functions are universal. In the case of the central limit theorem, forexample, under some very general conditions one obtains a Gaussian distribution,irrespectively of the original probability distributions. Moreover it depends on therandom variable and the number of the original variables via a specific combination.The Gaussian distribution is a well known example of the so-called stable probability
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 499
distributions. In the physical problems mentioned above scaling and universalityreflect the fact that the systems becomes scale invariant.A specific challenging problem still left unsolved in the present work is the
rigorous calculation of the distributions of 1=bmax and of 1=T ; that is proving theconjectures concerning these distributions. This will be attempted in the near future.
Acknowledgments
It is our great pleasure to thank Arkadi Nemirovski, Eduardo Sontag and OferZeitouni for stimulating and informative discussions. This research was supported inpart by the US–Israel Binational Science Foundation (BSF), by the Israeli ScienceFoundation Grant Number 307/98 (090-903), by the US National ScienceFoundation under Grant No. PHY99-07949, by the Minerva Center of NonlinearPhysics of Complex Systems and by the fund for Promotion of Research at theTechnion.
Appendix A. The Faybusovich vector field
In the following we consider the inner product /x; ZSX1 ¼ xT X1Z: This innerproduct is defined on the positive orthant Rn
þ ¼ fxARn: xi40; i ¼ 1;y; ng; whereit defines a Riemannian metric. In the following we denote by ai; i ¼ 1;y;m therows of A: The Faybusovich vector field is the gradient of h relative to this metricprojected to the constraint set [14]. It can be expressed as
grad h ¼ Xc Xm
i¼1ziðxÞXai; ðA:1Þ
where z1ðxÞ;y; zmðxÞ make the gradient perpendicular to the constraint vectors, i.e.A grad h ¼ 0; so that Ax ¼ b is maintained by the dynamics. The resulting flow is
dx
dt¼ FðxÞ ¼ grad h: ðA:2Þ
Consider the functions
CiðxÞ ¼ logðxiÞ þXm
j¼1aji logðxjþnmÞ; i ¼ 1;y; n m: ðA:3Þ
The Ci are defined such that their equations of motion are easily integrated. Thisgives n m equations which correspond to the n m independent variables of theLP problem. To compute the time derivative of Ci we first find:
rCi ¼1
xi
ei þXm
j¼1
aij
xjþnm
e jþnm; ðA:4Þ
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510500
and note that the vectors mi defined in Eq. (6) have the following property:
/mi; a jS ¼ 0; i ¼ 1;y; n m; j ¼ 1;y;m:
Therefore,
’CiðxÞ ¼/rCiðxÞ; ’xS ¼ /rCiðxÞ; grad hS
¼ mi; c Xm
j¼1zjðxÞa j
* +
¼/mi; cS � Di: ðA:5Þ
This equation is integrated to yield:
xiðtÞ ¼ xið0Þ exp Dit Xm
j¼1aij log
xjþnmðtÞxjþnmð0Þ
!: ðA:6Þ
Appendix B. The probability distribution PðuÞ
In this appendix, we study the probability distribution function
PðuÞ ¼ lp
� �m2þm2Z
dm2
B dmz elðtr BT BþzT zÞ � d u 1
zTðBT BÞ1z þ 1
!;
defined in (30) and (31) and calculate it in detail explicitly, in the large n;m limit.We will reconstruct PðuÞ from its moments. The Nth moment
kN ¼Z
N
0
du PðuÞuN
¼ lp
� �m2þm2Z
dm2
B dmz elðtr BT BþzT zÞ 1
zTðBT BÞ1z þ 1
!N
ðB:1Þ
of PðuÞ may be conveniently represented as
kN ¼ lp
� �m2þm2 1
GðNÞ
ZN
0
tN1et dt
Zdm2
B el tr BT B
Zdmz ezT ðlþ t
BT BÞz
¼ lp
� �m2
2 1
GðNÞ
ZN
0
tN1et dt
Zdm2
Bel tr BT Bffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffidetð1þ t=l
BT BÞ
q ; ðB:2Þ
where in the last step we have performed Gaussian integration over z:Recall that PðuÞ is independent of the arbitrary parameter l (see the remark
preceding (31)). Thus, its Nth moment kN must also be independent of l; which ismanifest in (B.2). Therefore, with no loss of generality, and for later convenience, we
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 501
will henceforth set l ¼ m (since we have in mind taking the large m limit). Thus,
kN ¼ m
p
� �m2
2 1
GðNÞ
ZN
0
tN1et dt
Zdm2
Bem tr BT Bffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffidetð1þ t=m
BT BÞ
q¼ 1
GðNÞ
ZN
0
tN1et ct
m
� �dt; ðB:3Þ
where we have introduced the function
cðyÞ ¼ m
p
� �m2
2Z
dm2
Bem tr BT Bffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffidetð1þ y
BT BÞ
q : ðB:4Þ
Note that
cð0Þ ¼ 1: ðB:5Þ
The function cðyÞ is well-defined for yX0; where it clearly decreases monotonically
c0ðyÞo0: ðB:6Þ
We would like now to integrate over the rotational degrees of freedom in dB: Anyreal m m matrix B may be decomposed as [3,22]
B ¼ OT1 OO2; ðB:7Þ
where O1;2AOðmÞ; the group of m m orthogonal matrices, and O ¼Diagðo1;y;omÞ; where o1;y;om are the singular values of B: Under thisdecomposition we may write the measure dB as [3,22]
dB ¼ dmðO1Þ dmðO2ÞYioj
jo2i o2
j jdmo; ðB:8Þ
where dmðO1;2Þ are Haar measures over the appropriate group manifolds. The
measure dB is manifestly invariant under actions of the orthogonal group OðmÞdB ¼ dðBOÞ ¼ dðO0BÞ; O;O0AOðmÞ; ðB:9Þ
as should have been expected to begin with.
Remark B.1. Note that the decomposition (B.7) is not unique, since O1D and DO2;with D being any of the 2m diagonal matrices Diagð71;y;71Þ; is an equally goodpair of orthogonal matrices to be used in (B.7). Thus, as O1 and O2 sweepindependently over the group OðmÞ; the measure (B.8) over counts B matrices. This
problem can be easily rectified by appropriately normalizing the volume Vm ¼RdmðO1Þ dmðO2Þ: One can show that the correct normalization of the volume is
Vm ¼ pmðmþ1Þ
2
2mQm
j¼1 Gð1þj2ÞGð j
2Þ: ðB:10Þ
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510502
One simple way to establish (B.10), is to calculateZdB exp 1
2trBT B ¼ ð2pÞ
m2
2 ¼ Vm
ZN
N
dmoYioj
jo2i o2
j j exp 1
2
Xi
o2i :
The last integral is a known Selberg type integral [22].
The integrand in (B.4) depends on B only through the combination BT B ¼OT2 O
2O2: Thus, the integrations over O1 and O2 in (B.4) factor out trivially. Thus, we
end up with
cðyÞ ¼ Vmm
p
� �m2
2Z
N
N
Qioj jo2
i o2j jdmoffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
detð1þ y
O2Þq em tr O2
: ðB:11Þ
It is a straightforward exercise to check that (B.10) is consistent with cð0Þ ¼ 1:Note that in deriving (B.11) we have made no approximations. Up to this point,
all our considerations in this appendix were exact. We are interested in the large n;m
asymptotic behavior1 of PðuÞ and of its moments. Thus, we will now evaluate thelarge m behavior of cðyÞ (which is why we have chosen l ¼ m in (B.3)). Thisasymptotic behavior is determined by the saddle point dominating the integral overthe m singular values oi in (B.11) as m-N:To obtain this asymptotic behavior we rewrite the integrand in (B.11) as
eSffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffidetð1þ y
O2Þq ;
where
S ¼ mXm
i¼1o2
i 1
2
Xioj
logðo2i o2
j Þ2: ðB:12Þ
In physical terms, S is the energy (or the action) of the so-called ‘‘Dyson gas’’ ofeigenvalues, familiar from the theory of random matrices.We look for a saddle point of the integral in (B.11) in which all the oi are of Oð1Þ:
In such a case, S in (B.12) is of Oðm2Þ; and thus eS overwhelms the factor
1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffidetð1þ y
O2Þq ¼ e
m2
IðyÞ;
where
IðyÞ ¼ 1
m
Xm
i¼1log 1þ y
o2i
� �ðB:13Þ
is a quantity of Oðm0Þ: For later use, note thatIð0Þ ¼ 0: ðB:14Þ
ARTICLE IN PRESS
1Recall that m and n tend to infinity with the ratio (41), r ¼ m=n; kept finite.
A. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 503
Thus, to leading order in 1=m; cðyÞ is dominated by the well defined and stablesaddle point of S; which is indeed the case.Simple arguments pertaining to the physics of the Dyson gas make it clear that the
saddle point is stable: The ‘‘confining potential’’ termP
i o2i in (B.12) tends to
condense all the oi at zero, while the ‘‘Coulomb repulsion’’ term P
ioj logðo2i
o2j Þ
2 acts to keep the joij apart. Equilibrium must be reached as a compromise, and
it must be stable, since the quadratic confining potential would eventually dominatethe logarithmic repulsive interaction for oi large enough. The saddle point equations
@S
@oi
¼ 2oi m Xjai
1
o2i o2
j
" #¼ 0; ðB:15Þ
are simply the equilibrium conditions between repulsive and attractive interactions,and thus determine the distribution of the joij:We will solve (B.15) (using standard techniques of random matrix theory), and
thus will determine the equilibrium configuration of the molecules of the Dyson gasin the next appendix, where we show that the m singular values oi condense (nonuniformly) into the finite segment (see Eq. (C.11))
0po2i p2
(and thus with mean spacing of the order of 1=m).To summarize, in the large m limit, cðyÞ is determined by the saddle point of the
energy S (B.12) of the Dyson gas. Thus for large m; according to (B.11)–(B.13),
cðyÞCV
2m
m
p
� �m2
2exp S� þ
m
2I�ðyÞ
� �;
where S� is the extremal value of (B.12), and I�ðyÞ is (B.13) evaluated at thatequilibrium configuration of the Dyson gas, namely,
I�ðyÞ ¼1
m
Xm
i¼1log 1þ y
o2i�
� �: ðB:16Þ
The actual value of S� (a number of Oðm2Þ) is of no special interest to us here,since from (B.5) and (B.14) we immediately deduce that in the large m limit
cðyÞCem2
I�ðyÞ: ðB:17Þ
Substituting (B.17) back into (B.3) we thus obtain the large ðn;mÞ behaviorof kN as
kNB1
GðNÞ
ZN
0
tN1etm2
I�ð tmÞ dt: ðB:18Þ
The function I�ðyÞ is evaluated in the next Appendix, and is given in Eq. (C.15),
I�ðyÞ ¼ y þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiy2 þ 2y
pþ logðy þ 1þ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiy2 þ 2y
pÞ;
which we repeated here for convenience.
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510504
The dominant contribution to the integral in (B.18) comes from values of t5m;since the function
fðtÞ ¼ t þ m
2I�
t
m
� �; ðB:19Þ
which appears in the exponent in (B.18) is monotonously increasing, as can be seenfrom (C.14). Thus, in this range of the variable t; using (C.16), we have
fðtÞ ¼ t þ m
2I�
t
m
� �¼ 2
ffiffiffiffiffiffiffiffi2mt
pþ t
2þ O
1ffiffiffiffim
p� �
: ðB:20Þ
Note that the term t=2 in (B.20) is beyond the accuracy of our approximation for I�:The reason is that in (C.12) we used the continuum approximation to the density ofsingular values, which introduced errors of the orders of 1=m: Fortunately, this termis not required. The leading order term in the exponential (B.20) of (B.18) is justffiffiffiffiffiffiffiffi2mt
p: Consequently, in the leading order (B.18) reduces to
kN �Z
N
0
du PðuÞuNC1
GðNÞ
ZN
0
tN1effiffiffiffiffiffi2mt
pdt
¼ 2Gð2NÞð2mÞNGðNÞ
¼ ð2N 1Þ!!mN
: ðB:21Þ
Moments (B.21) satisfy Carleman’s criterion [6,13]XNN¼1
k1=2NN ¼ N; ðB:22Þ
which is sufficient to guarantee that these moments define a unique distribution PðuÞ:Had we kept in (B.21) the Oðm0Þ piece of (B.20), i.e., the term t=2; it would have
produced a correction factor to (B.21) of the form 1þ OðN2=mÞ: To see this,consider the integral
1
GðNÞ
ZN
0
tN1effiffiffiffiffiffi2mt
pt=2 dt ¼ 2
ð2mÞNGðNÞ
ZN
0
y2N1eyy2=4m dy
C2
ð2mÞNGðNÞ
ZN
0
y2N1eyð1 y2=4m þ?Þ dy:
Thus, we can safely trust (B.21) for moments of order N5ffiffiffiffim
p:
The expression in (B.21) is readily recognized as the 2Nth moment of a Gaussiandistribution defined on the positive half-line. Indeed, the moments of the Gaussiandistribution
gðx; mÞ ¼ 2mffiffiffip
p em2x2 ; xX0 ðB:23Þ
are
/xkS ¼Gðkþ1
2Þffiffiffi
pp
mk: ðB:24Þ
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 505
In particular, the even moments of (B.23) are
/x2NS ¼GðN þ 1
2Þffiffiffi
pp
m2N¼ ð2N 1Þ!!
ð2m2ÞN; ðB:25Þ
which coincide with (B.21) for 2m2 ¼ m: These are the moments of u ¼ x2 for the
distribution PðuÞ satisfying PðuÞ du ¼ gðx;ffiffiffiffiffiffiffiffiffim=2
pÞ dx; as can be seen comparing
(B.21) and (B.25).Thus, we conclude that the leading asymptotic behavior of PðuÞ as m tends to
infinity is
PðuÞ ¼ffiffiffiffiffiffiffiffim
2pu
re
mu2 ; ðB:26Þ
the result quoted in (42).As an additional check of this simple determination of PðuÞ from (B.21), we now
sketch how to derive it more formally from the function
GðzÞ ¼Z
N
0
PðuÞ du
z u; ðB:27Þ
known sometimes as the Stieltjes transform of PðuÞ [6]. GðzÞ is analytic in thecomplex z-plane, cut along the support of PðuÞ on the real axis. We can thendetermine PðuÞ from (B.27), once we have an explicit expression for GðzÞ; using theidentity
PðuÞ ¼ 1
pIm Gðu ieÞ: ðB:28Þ
For z large and off the real axis, and if all the moments of PðuÞ exist, we can formallyexpand GðzÞ in inverse powers of z: Thus,
GðzÞ ¼XNN¼0
ZN
0
PðuÞuN du
zNþ1 ¼XNN¼0
kN
zNþ1: ðB:29Þ
For the kN ’s given by (B.21), the series (B.29) diverges. However, it is Borelsummable [6]. Borel resummation of (B.21), making use of
1ffiffiffiffiffiffiffiffiffiffiffi1 x
p ¼ 1þXNN¼1
ð2N 1Þ!!N!
x
2
� �N
;
yields
GðzÞ ¼ 1
z
ZN
0
et dtffiffiffiffiffiffiffiffiffiffiffiffiffi1 2t
mz
q : ðB:30Þ
Thus,
1
pIm Gðu ieÞ ¼ 1
pu
ZN
mu2
et dtffiffiffiffiffiffiffiffiffiffiffiffiffi2tmu
1q ¼
ffiffiffiffiffiffiffiffim
2pu
re
mu2 ; ðB:31Þ
which coincides with (B.26).
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510506
Appendix C. The saddle point distribution of the oi
We present in this Appendix the solution of the equilibrium condition (B.15) of theDyson gas of singular values
@S
@o2i
¼ m Xjai
1
o2i o2
j
¼ 0 ðC:1Þ
(which we repeated here for convenience), and then use it to calculate I�ðyÞ; definedin (B.16). We follow standard methods [3,9] of random matrix theory [22]. Let
si ¼ o2i ; ðC:2Þ
and also define
FðwÞ ¼ 1
m
Xm
i¼1
1
w si
� �¼ 1
mtr
1
w BT B
� �; ðC:3Þ
where w is a complex variable. Here the angular brackets denote averaging withrespect to the B sector of (13). By definition, FðwÞ behaves asymptotically as
FðwÞw-N��! 1
w: ðC:4Þ
It is clear from (C.3) that for s40; e-0þ we have
Fðs ieÞ ¼ 1
mP:P:
Xm
i¼1
1
s si
� �þ ip
m
Xm
i¼1/dðs siÞS; ðC:5Þ
where P:P: stands for the principal part. Therefore (from (C.3)), the average
eigenvalue density of BT B is given by
rðsÞ � 1
m
Xm
i¼1/dðs siÞS ¼ 1
pIm Fðs ieÞ: ðC:6Þ
In the large m limit, the real part of (C.5) is fixed by (C.1), namely, setting s ¼ si;
Re Fðs ieÞ � 1
m
Xj
1
s sj
* +¼ 1: ðC:7Þ
From the discussion of physical equilibrium of the Dyson gas (see the paragraphpreceding (B.15)), we expect the fsig to be contained in a single finite segment0pspa; with a yet to be determined. This means that FðwÞ should have a cut (alongthe real axis, where the eigenvalues of BT B are found) connecting w ¼ 0 and a:Furthermore, rðsÞ must be integrable as s-0þ; since a macroscopic number (i.e., afinite fraction of m) of eigenvalues cannot condense at s ¼ 0; due to repulsion. Theseconsiderations, together with (C.7) lead [3,9] to the reasonable ansatz
FðwÞ ¼ 1þ p
wþ q
� � ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiwðw aÞ
p; ðC:8Þ
with parameters p and q: The asymptotic behavior (C.4) then immediately fixes
q ¼ 0; p ¼ 1; and a ¼ 2: ðC:9Þ
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 507
Thus,
FðwÞ ¼ 1ffiffiffiffiffiffiffiffiffiffiffiffiw 2
w
r: ðC:10Þ
The eigenvalue distribution of BT B is therefore
rðsÞ ¼ 1
pIm Fðs ieÞ ¼ 1
p
ffiffiffiffiffiffiffiffiffiffiffi2 s
s
rðC:11Þ
for 0oso2; and zero elsewhere. As a simple check, note thatZ 2
0
rðsÞ ds ¼ 1;
as guaranteed by the unit numerator in (C.14).
Thus, as mentioned in the previous appendix, o2i ; the eigenvalues of BT B; are
confined in a finite segment 0oso2: In the limit m-N; they form a continuouscondensate in this segment, with non uniform distribution (C.11).In an obvious manner, we can calculate S�; the extremal value of S in (B.12), by
replacing the discrete sums over the si by continuous integrals with weights rðsÞgiven by (C.11). We do not calculate S� explicitly, but merely mention the obvious
result that it is a number of Oðm2Þ: Similarly, from (B.16) and (C.11) we obtain
I�ðyÞ ¼Z 2
0
rðsÞ log 1þ y
s
� �ds ¼ 1
p
Z 2
0
ffiffiffiffiffiffiffiffiffiffiffi2 s
s
rlog 1þ y
s
� �ds: ðC:12Þ
Since the continuum approximation for rðsÞ introduces an error of the order 1=m; an
error of similar order is introduced in I�: It is easier to evaluatedI�ðyÞ
dy; and then
integrate back, to obtain I�ðyÞ: We find from (C.12)
dI�ðyÞdy
¼ FðyÞ ¼ 1þ y þ 2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiy2 þ 2y
p ¼ 1þffiffiffiffiffiffiffiffiffiffiffi1þ 2
y
s: ðC:13Þ
It is clear from the last equality in (C.13) that
dI�ðyÞdy
40 ðC:14Þ
for y40: Integrating (C.13), and using (B.14), I�ð0Þ ¼ 0; to determine the integrationconstant, we finally obtain
I�ðyÞ ¼ y þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiy2 þ 2y
pþ logðy þ 1þ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiy2 þ 2y
pÞ: ðC:15Þ
From (C.15) we obtain the limiting behaviors
I�ðyÞ ¼ 2ffiffiffiffiffi2y
p y þ Oðy3=2Þ; 0py51; ðC:16Þ
and
I�ðyÞ ¼ log2y
e
� �þ O
1
y
� �; yb1: ðC:17Þ
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510508
Due to (C.13), I�ðyÞ increases monotonically from I�ð0Þ ¼ 0 to its asymptotic form(C.17). Note that for y ¼ t=m (as required in (B.18)), the second term in (C.16) isOð1=mÞ and therefore it is beyond the accuracy of the approximation of this section.
References
[1] E. Abrahams, P.W. Anderson, D.C. Licciardelo, T.V. Ramakrishnan, Scaling theory of localization:
absence of quantum diffusion in two dimensions, Phys. Rev. Lett. 42 (1979) 673;
E. Abrahams, P.W. Anderson, D.S. Fisher, D.J. Thouless, New method for a scaling theory of
localization, Phys. Rev. B 22 (1980) 3519;
P.W. Anderson, New method for scaling theory of localization II Multichannel theory of a ‘‘wire’’
and possible extension to higher dimensionality, Phys. Rev. B 23 (1981) 4828.
[2] B.L. Altshuler, V.E. Kravtsov, I.V. Lerner, in: B.L. Altshuler, P.A. Lee, R.A. Webb (Eds.),
Mesoscopic Phenomena in Solids, North-Holland, Amsterdam, 1991.
[3] A. Anderson, R.C. Myers, V. Periwal, Complex random surfaces, Phys. Lett. B 254 (1991) 89;
A. Anderson, R.C. Myers, V. Periwal, Branched polymers from a double scaling limit of matrix
models, Nucl. Phys. B 360 (1991) 463 (Section 3);
J. Feinberg, A. Zee, Renormalizing rectangles and other topics in random matrix theory, J. Statist.
Mech. 87 (1997) 473–504;
G.M. Cicuta, L. Molinari, E. Montaldi, F. Riva, Large rectangular random matrices, J. Math. Phys.
28 (1987) 1716.
[4] K.M. Anstreicher, J. Ji, F.A. Potra, Y. Ye, Probabilistic analysis of an infeasible interior-point
algorithm for linear programming, Math. Oper. Res. 24 (1999) 176–192.
[5] A. Ben-Hur, H.T. Siegelmann, S. Fishman, A theory of complexity for continuous time dynamics, J.
Complexity 18 (2002) 51–86.
[6] C.M. Bender, S.A. Orszag, Advanced Mathematical Methods for Scientists and Engineers, 2nd
Edition, Springer-Verlag, New York, 1999 (Chapter 8).
[7] L. Blum, F. Cucker, M. Shub, S. Smale, Complexity and Real Computation, Springer-Verlag,
London, 1999.
[8] M.S. Branicky, Analog computation with continuous ODEs, in: Proceedings of the IEEE Workshop
on Physics and Computation, Dallas, TX, 1994, pp. 265–274.
[9] E. Brezin, C. Itzykson, G. Parisi, J.-B. Zuber, Planar diagrams, Comm. Math. Phys. 59 (1978) 35.
[10] R.W. Brockett, Dynamical systems that sort lists, diagonalize matrices and solve linear programming
problems, Linear Algebra Appl. 146 (1991) 79–91.
[11] L.O. Chua, G.N. Lin, Nonlinear programming without computation, IEEE Trans. Circuits Systems
31 (2) (1984) 182–188.
[12] A. Cichocki, R. Unbehauen, Neural Networks for Optimization and Signal Processing, John Wiley,
New York, 1993.
[13] R. Durrett, Probability: Theory and Examples, 2nd Edition, Wadswarth Publishing Co., Belmont,
1996 (Chapter 2).
[14] L. Faybusovich, Dynamical systems which solve optimization problems with linear constraints, IMA
J. Math. Control Inform. 8 (1991) 135–149.
[15] J. Feinberg, On the universality of the probability distribution of the product B1X of random
matrices, arXiv:math.PR/0204312, 2002.
[16] V.L. Girko, On the distribution of solutions of systems of linear equations with random coefficients,
Theory Probab. Math. Statist. 2 (1974) 41–44.
[17] B.V. Gnedenko, A.N. Kolmogorov, Limit Distributions for Sums of Independent Random Variables,
Addison-Wesley, Reading, MA, 1954.
[18] U. Helmke, J.B. Moore, Optimization and Dynamical Systems, Springer-Verlag, London, 1994.
[19] J. Hertz, A. Krogh, R. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley,
Redwood City, 1991.
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510 509
[20] X.B. Liang, J. Wang, A recurrent neural network for nonlinear optimization with a continuously
differentiable objective function and bound constraints, IEEE Trans. Neural Networks 11 (2000)
1251–1262.
[21] C. Mead, Analog VLSI and Neural Systems, Addison-Wesley, Reading, MA, 1989.
[22] M.L. Mehta, Random Matrices, 2nd Edition, Academic Press, Boston, 1991.
[23] S. Mizuno, M.J. Todd, Y. Ye, On adaptive-step primal-dual interior-point algorithms for linear
programming, Math. Oper. Res. 18 (1993) 964–981.
[24] C. Papadimitriou, Computational Complexity, Addison-Wesley, Reading, MA, 1995.
[25] J. Renegar, Incorporating condition measures into the complexity theory of linear programming,
SIAM J. Optim. 5 (3) (1995) 506–524.
[26] R. Saigal, Linear Programming, Kluwer Academic, Dordrecht, 1995.
[27] R. Shamir, The efficiency of the simplex method: a survey, Manage. Sci. 33 (3) (1987) 301–334.
[28] H.T. Siegelmann, S. Fishman, Computation by dynamical systems, Physica D 120 (1998) 214–235.
[29] S. Smale, On the average number of steps in the simplex method of linear programming, Math.
Programming 27 (1983) 241–262.
[30] M.J. Todd, Probabilistic models for linear programming, Math. Oper. Res. 16 (1991) 671–693.
[31] J.F. Traub, H. Wozniakowski, Complexity of linear programming, Oper. Res. Lett. 1 (1982) 59–62.
[32] K.G. Wilson, J. Kogut, The renormalization group and the epsilon expansion, Phys. Rep. 12
(1974) 75;
J. Cardy, Scaling and Renormalization in Statistical Physics, Cambridge University Press,
Cambridge, 1996.
[33] Y. Ye, Interior Point Algorithms: Theory and Analysis, John Wiley and Sons Inc., New York, 1997.
[34] Y. Ye, Toward probabilistic analysis of interior-point algorithms for linear programming, Math.
Oper. Res. 19 (1994) 38–52.
ARTICLE IN PRESSA. Ben-Hur et al. / Journal of Complexity 19 (2003) 474–510510