an o(log n=loglog n -approximation algorithm for...

Submitted to Operations Researchmanuscript 2

An O(logn/ log logn)-approximation Algorithm for theAsymmetric Traveling Salesman Problem

Arash AsadpourNYU Stern School of Business, Department of Informations, Operations, and Management Science, [email protected]

Michel X. GoemansMassachusetts Institute of Technology, Department of Mathematics, [email protected]

Aleksander MądryMassachusetts Institute of Technology, Department of Electrical Engineering & Computer Science, [email protected]

Shayan Oveis GharanUniversity of Washington, Department of Computer Science and Engineering, [email protected]

Amin SaberiStanford University, Department of Management Science and Engineering, [email protected]

We present a randomized O(logn/ log logn)-approximation algorithm for the asymmetric traveling sales-

man problem (ATSP). This provides the first asymptotic improvement over the long-standing Θ(logn)-

approximation bound stemming from the work of Frieze et al. [FGM82].

The key ingredient of our approach is a new connection between the approximability of the ATSP and the

notion of so-called thin trees. To exploit this connection, we employ maximum entropy rounding – a novel

method of randomized rounding of LP relaxations of optimization problems. We believe that this method

might be of independent interest.

Key words : maximum entropy; thin tree; Held-Karp relaxation; circulation network.

Subject classifications : Analysis of algorithms: performance guarantee; Linear programming: randomized

rounding; Traveling salesman: approximation algorithm.

Area of review : Optimization.

1. Introduction

The traveling salesman problem is one of the most celebrated and intensively studied problems in

combinatorial optimization [LSKL85, ABCC07, Coo12]. It has found applications in logistics, plan-

ning, manufacturing and testing microchips [LK74], as well as DNA sequencing [PTW01]. The roots

of this problem go back as far as the first half of the 19th century, to the works of Hamilton [Coo12] on

1

Asadpour, Goemans, Mądry, Oveis Gharan, and Saberi: An O(logn/ log logn)-approximation Algorithm for ATSP2 Article submitted to Operations Research; manuscript no. 2

chess knight movement. However, its most popular formulation is in the context of traveling through

a collection of cities: given a list of cities and their pairwise distances, the task is to find a shortest

possible tour that visits each city exactly once.

The asymmetric (or general) traveling salesman problem (ATSP) concerns a situation when the

distances between the cities are asymmetric. Formally, we are given a set V of n points and an

(asymmetric) cost function c : V ×V →R+. For the rest of this paper, we represent the problem by

a directed graph G, where V (resp. V ×V ) refers to the set of vertices (resp. arcs) of G. The goal is

to find a minimum cost tour that visits every vertex exactly once. As is standard in the literature,

throughout the paper, we make an assumption that the cost function is a metric, i.e., it satisfies the

triangle inequality cij + cjk ≤ cik for all vertices i, j, and k. Note that if we are allowed to visit a

vertex more than once, then by substituting the cost of each arc (u, v) with the cost of the shortest

path from u to v we automatically ensure that the triangle inequality holds for all triplets.

In the very important special case of the problem where the cost function is symmetric, i.e., when

for every u, v ∈ V we have c(u, v) = c(v,u), there is a celebrated factor 3/2-approximation algorithm

due to Christofides [Chr76]. This algorithm first computes a minimum cost spanning tree T on V ; then

finds the minimum cost Eulerian augmentation of that tree; and finally shortcuts the corresponding

Eulerian walk into a tour.

In this paper, we are concerned with the general, asymmetric version and give an O( lognlog logn

)-

approximation algorithm for it. This finally improves the thirty-year-old Θ(logn)-approximation

ratio stemming from the work of Frieze et al. [FGM82] and its subsequent improvements to

0.999 log2 n,0.842 log2 n, and 0.666 log2 n [Bla02, KLSS05, FS07]. This paper is a revised and extended

version of the conference version by Asadpour et al. [AGM+10].

1.1. The Roadmap of the Algorithm

Our algorithm begins by formulating the asymmetric TSP via the Held-Karp linear programming

relaxation. By solving the LP relaxation, the algorithm gets an optimal fractional solution x ∗. The

algorithm then symmetrizes the solution x ∗ and scales it down by a factor of n−1n

, in order to get

Asadpour, Goemans, Mądry, Oveis Gharan, and Saberi: An O(logn/ log logn)-approximation Algorithm for ATSPArticle submitted to Operations Research; manuscript no. 2 3

a solution z ∗. We show that z ∗ is in the spanning tree polytope of the undirected version of G.

See Section 3 for the details about the Held-Karp relaxation and the transformation of x ∗ to z ∗.

The main step of our approach is to round the fractional solution z ∗ to a spanning tree (i.e., a

corner point in the spanning tree polytope) with some specific properties: we want to find a spanning

tree T that, on one hand, has relatively small cost compared to OPTHK (the value of the optimum

solution of the Held-Karp relaxation of the asymmetric TSP), and on the other hand, has the crucial

property of having (almost) the same number of edges in any cut as of z ∗. In order to do so, we

introduce the notion of thinness. More specifically, as we define in Definition 4.1, a spanning tree

T is called “(α,s)-thin” with respect to a given point z in the spanning tree polytope, if for each

subset of vertices ∅ 6= U ⊂ V the number of arcs of T with exactly one endpoint in U is at most

α ·∑

u∈U,v∈V \U zuv, and moreover, the cost of T is at most s ·∑

u,v zuv. The central fact motivating

our interest in such a tree is that, as we show in Theorem 4.1, if a spanning tree T is (α,s)-thin

with respect to z ∗, then one can find an Eulerian augmentation of T with a total cost of at most

(2α+ s)OPTHK (and thus, within the same factor of the optimum solution of the asymmetric TSP).

The idea of the augmentation is based on Hoffman’s circulation theorem. We emphasize that the

importance of thin tree definition is that it disregards the orientation of the arcs. This helps us to

eliminate the inherent asymmetry of the asymmetric TSP. See Section 4 for the details.

We remark that two similar notions of thinness have been studied before in the literature. In

Section 4 of [God], a notion of thinness is defined to study Jaeger’s conjecture. This definition can

be thought as a special case of ours, when z is integral. Also, in [GHJS09] the notion of “nearly-

balanced” graphs was introduced, where instead of bounding the number of edges in every cut by

some factor of the size of the cut, the authors require the number of out-edges and in-edges in every

cut are close to each other.

Note that we are building an Eulerian graph by augmenting a spanning tree. In this regard, one

can think of our approach as the generalization of Christofides’ work to the asymmetric TSP.

In the light of the above, the technical core of our paper comprises a way to find an (α,s)-thin tree

with respect to z ∗, for α = O( lognlog logn

) and s = O(1). We achieve this by sampling a spanning tree


from a certain distribution over the spanning trees of the undirected version of G. This distribution

is the one that maximizes the entropy among all the distributions that (approximately) preserve the

edge marginals imposed by z ∗. The crucial property of such entropy-maximizing distribution is that

the events corresponding to edges being present in the sampled tree are negatively correlated. This

means that the well-known Chernoff bounds for the upper tail of independent settings still hold (see

Panconesi and Srinivasan [PS97]). See Section 5 for the details of maximum entropy sampling in the

asymmetric TSP.

By using this tail bound together with the union-bounding technique of Karger [Kar93], we are

able to establish the desired (O( lognlog logn

),O(1))-thinness of the sampled tree (with high probability)

in Theorem 6.1. Note that even without a polynomial-time algorithm to sample from the maximum

entropy distribution, this theorem establishes that the Held-Karp LP relaxation has an integrality

gap no worse than O( lognlog logn

). See Section 6 for more details.

All that remains, in order to provide a polynomial-time algorithm that provides an O( lognlog logn

)-

approximation factor to the asymmetric TSP, is to give a polynomial-time algorithm to sample from

the maximum entropy distribution. There are two main ingredients for this part. The first ingredient is

a succinct representation for the maximum entropy distribution as a product-form distribution, which

emerges as the solution of a convex program that can be efficiently solved. The other ingredient is the

fact that sampling from product-form distributions over spanning trees can be reduced to counting

the number of spanning trees, which is well-known to admit efficient algorithms. See Sections 5.2

and 5.3 for how to sample a spanning tree from product-form distributions, and Sections 7 and 8 for

how to solve the corresponding convex program.

We believe that this general, maximum entropy–based approach to rounding is also of indepen-

dent interest. In particular, after the publication of [AGM+10], this approach was used to obtain

an improved approximation algorithm for the graphic variant of the symmetric TSP [OSS11]. Maxi-

mum entropy distributions have a long history in information theory and statistical mechanics (refer

to [CT06] for some references and examples). In the context of randomized rounding, it has been


observed in [AS10] that the specific martingale–based dependent randomized rounding method used

in their algorithm for the fair allocation of indivisible goods coincides with a distribution that maxi-

mizes entropy on the bipartite matching polytope. In this paper, we introduce rounding by sampling

from the maximum entropy distribution as a potential technique for randomized rounding in the

presence of a combinatorial structure.

Algorithm 1 provides the high level description of our approach. Also, the proof of our main

theorem (Theorem 6.2) gives a formal overview of the algorithm. In the rest of the introduction

we provide the high-level intuition behind our maximum entropy–based approach for finding a thin

spanning tree.

1.2. Rounding By Sampling From Maximum Entropy Distribution

At a high level, one can view our sampling procedure as a randomized rounding approach. It operates

on z ∗, a symmetrized and slightly scaled down version of our solution x ∗ to the Held-Karp relaxation.

As we will show in Section 3 (see Lemma 3.1) z ∗ is a point in the (relative interior of) spanning tree

polytope of an undirected graph whose edge set (defined by u, v : x∗uv + x∗vu > 0) is the support

of x ∗. In other words, z ∗ can be seen as a “fractional spanning tree”.

In the light of this observation, our goal is to round a given fractional spanning tree z ∗ to an

integral spanning tree (i.e., a corner point of the spanning tree polytope) while roughly preserving

some of its quantitative properties, namely, making it (α,s)-thin for some values of α and s that are

sufficiently small for our purpose.

The independent randomized rounding technique of Raghavan and Thompson [RT87] is one of

the most well-known and widely used technique for rounding a fractional vector z = (z1, z2, · · · , zm)

with 0≤ zi ≤ 1, for each i. In this rounding technique, the fractional vector z is rounded to a 0/1

vector Z = (Z1,Z2 · · · ,Zm), where Pr[Zi = 1] = zi, independently for each i. This method has two

very convenient properties:

(Property 1) The resulting distribution preserves marginals , i.e., E[Zi] = zi for every 1 ≤ i ≤ m.

Hence, due to the linearity of expectation, every quantity that is linear in z (such as the


Algorithm 1 An O(logn/ log logn)-approximation Algorithm for the ATSP.Input: A set V consisting of n points and a cost function c : V × V → R+ satisfying the triangle

inequality.

Output: O( lognlog logn

)-approximation to the asymmetric TSP instance described by V and c.

1. Solve the Held-Karp LP relaxation of the ATSP instance to get an optimum extreme point

solution x ∗. Define z ∗ by z∗u,v := n−1n

(x∗uv + x∗vu), for all u, v ∈ V , making it a symmetrized

and scaled down version of x ∗. Vector z ∗ can be viewed as a point in the spanning tree

polytope of the undirected graph defined by the edges corresponding to the support of x ∗

after disregarding the directions of arcs. (See Section 3.)

2. Let E be the support graph of z∗ when the direction of the arcs are disregarded. Find weights

γe∈E such that the product-form distribution over the spanning trees defined by p(T ) ∝

exp(∑

e∈T γe

)approximately preserves the marginals imposed by z ∗, i.e., for any edge e∈E,

∑T :T3e

p(T )≤ (1 + ε)z∗e ,

for a small enough value of ε. (In this paper, we show that ε= 0.2 suffices for our purpose. See

Sections 7 and 8 for a description of how to compute such a distribution.)

3. Sample 2dlogne spanning trees T1, . . . , T2dlogne from p(·). (See Sections 5.2 and 5.3 for how

to sample a spanning tree from p(·) efficiently.) For each of these trees, orient all its edges so

as to minimize its cost with respect to our (asymmetric) cost function c. Let T ∗ be the tree

whose resulting cost is the lowest among that of all the sampled trees.

4. Find a minimum cost integral circulation that contains the oriented tree ~T ∗. Shortcut this

circulation to a tour and output it. (See Section 4.)

total cost of the tree∑

e ceze, or, the number of edges crossing a given cut) remains the

same in expectation. In other words, for any vector b∈Rm, we have E[bT ·Z ] = bT ·z .

(Property 2) All the variables are rounded independently, which allows us to use strong concentration

bounds such as Chernoff bounds.


However, the problem with this approach is that the independent randomized rounding ignores

any underlying combinatorial structures of the solution. For example, in our case, it is not hard to

see that independent rounding of variables associated with each edge of our underlying graph will not

only most likely fail to produce a tree, but – even more critically – the resulting graph will be almost

surely disconnected. (We should note that one can show that oversampling the edges by a factor of

Ω(logn) would ensure connectivity here. Nevertheless, the expected cost of the sampled graph would

increase by the same factor, which is too big for our purpose of producing (O( lognlog logn

),O(1))-thin

spanning trees.)

Therefore, our rounding procedure needs to be more careful. In particular, to ensure connectivity,

we will restrict ourselves only to sampling from distributions over spanning trees.

Note that since z is in the spanning tree polytope, it can be expressed as a convex combination of

spanning trees. Namely, we can write z in the following form.

z = α1T(1) +α2T

(2) + · · ·+αkT(k),

for some αj ≥ 0, and∑

j αj = 1, where each T (j) = (T(j)1 , T

(j)2 , ·, T (j)

m ) is a 0/1 vector representing

a spanning tree with edges e : T (j)e = 1. One can see that there can be many ways to express a

fractional spanning tree z in such a manner, and each one of these convex combinations defines a

distribution over spanning trees where each T (j) is sampled with probability αj . However, all such

distributions preserve the edge marginals imposed by z , i.e., E[T(j)i ] = zi, for 1≤ i≤m. Hence, the

choice of any such convex combination automatically ensures that Property 1 of the independent

randomized rounding technique still holds.

Unfortunately, Property 2 (namely, independence) is impossible to satisfy, as the underlying struc-

ture of spanning trees imposes inevitable dependencies between the random variables. This is prob-

lematic, as without the ability to resort to some strong concentration phenomena we cannot hope

that all the (exponentially many) quantities that we are interested in preserving (specifically, the

number of edges crossing each cut) are indeed simultaneously close to their desired expected value.


However, there is a way to get around this difficulty. The crucial observation is that we do not really

need these events (corresponding to edges being a part of the sampled tree) to be fully independent. It

actually suffices if they are negatively correlated. As was observed first by Panconesi and Srinivasan

[PS97], negative correlation is sufficient for the upper tail of the Chernoff bound to hold. As we then

show, this concentration combined with union-bounding technique of Karger [Kar93] will be sufficient

to obtain the desired (O( lognlog logn

),O(1)) bound for the thinness of the sampled tree. See Sections 5.4

and 6 for details.

Now, in the light of the above discussion, it remains to devise a way of obtaining (and efficiently

sampling from) a marginal-preserving distribution over spanning trees that has such a negative

correlation property. Our idea is to employ maximum entropy rounding: we take the distribution

among all the marginal-preserving distributions over spanning trees that maximizes the entropy.

Intuitively, this distribution maximizes the “randomness” while preserving the structural properties

of the fractional solution z . In other words, it avoids inflicting any correlation upon the edges other

than what is imposed by the spanning tree structure.

To complete our approach, first we need to devise an algorithm to find and sample from the

marginal-preserving maximum entropy distribution, and second we need to prove the negative corre-

lation property. We note that finding the maximum entropy distribution and sampling from it may

seem intractable at first, because this distribution is supported over all (possibly exponentially many)

spanning trees of our graph. The key to answer both of these questions is the close connection between

maximum entropy distributions and product-form distributions. Product-form distributions can be

viewed as a generalization of uniform distributions. In general, any set of weights γe assigned to edges

of a graph defines a product-form distribution over spanning trees where the probability of each tree

T is proportional to exp(∑

e∈T γe

). Note that having equal values for γe-s results in the uniform

distribution over spanning trees. It was known before the publication of [AGM+10] that maximum

entropy distributions are product-form distributions and any product-form distribution is a maximum

entropy distribution for its own marginals (see e.g., [BV06, Section 5.2.4]). This has several implica-

tions. First, the maximum entropy distribution can be described concisely just by representing the


values γe for all edges. Second, sampling from maximum entropy distribution reduces to the problem

of sampling uniform spanning trees which has been studied for many years [Gue83, Kul90, CMN96].

Third, the maximum entropy distribution satisfies negative correlation because any product-form

distribution over spanning tree satisfies negative correlation [LP14, Chapter 4].

It remains to find the maximum entropy distribution over spanning trees that preserve the marginal

probabilities imposed by z . This problem can be cast as a simple convex programming optimization

problem with exponentially many variables corresponding to the probability of each spanning tree

of our graph. We then give two polynomial-time algorithms that find product-form distributions

that approximately (with sufficiently high precision) preserve the marginals of a given vector z in

the spanning tree polytope. The first one is a simple combinatorial algorithm based on the idea

of multiplicative weight updates. The second one uses the Ellipsoid method to find a near optimal

solution to the dual of the maximum entropy convex program. For more information on the Ellipsoid

method, we refer the reader to [GLS93]. See Sections 7 and 8 for more details.

1.3. Structure of the Paper

The rest of the paper is organized as follows. In Section 2, we define the required notations. In

Section 3, we discuss the Held-Karp linear programming relaxation for ATSP and how to derive a

solution z ∗ in the spanning tree polytope of the undirected version of G from the optimal fractional

solution x ∗ of the Held-Karp LP relaxation. Our main proof starts afterwards. In Section 4, we

formally define thin trees and we reduce the approximation of the asymmetric TSP to the problem

of finding a thin tree. In Section 5, we formally define the maximum entropy sampling method and

the maximum entropy convex program that preserves the marginals of z ∗, derived from the optimal

solution of Held-Karp relaxation. We also prove that the optimum solution of this program is a

product-form distribution. Finally, we prove our main result in Section 6 and show that we can

efficiently find an O( lognlog logn

)-approximate solution for the asymmetric TSP. This is done by proving

that a spanning tree sampled from the maximum entropy distribution with respect to marginal

probabilities z ∗ is sufficiently thin. The last two sections are dedicated to providing two different


algorithms for finding a product-form distribution that (approximately) preserve the marginals of

a given point z in the spanning tree polytope; namely, in Section 7 we provide a combinatorial

algorithm and in Section 8, we use the Ellipsoid method to solve the dual of the maximum entropy

convex program.

2. Notation

Throughout this paper, we use a= (u, v) to denote the arc (directed edge) from u to v and e= u, v

to denote an undirected edge. We will use A (resp. E) to denote the set of arcs (resp. edges) of a

directed (resp. undirected) graph we are working with. (This graph will be always clear from the

context. See the comments after (3.5) as an example.)

Given a function f : A → R on the arcs of a graph, we define the cost of f to be c(f) :=∑a∈A c(a)f(a), where c : V × V → R+ is the cost function of the asymmetric traveling salesman

problem.

Also, for a subset S ⊆A of arcs, we denote by f(S) :=∑

a∈S f(a) the sum of the values of f on

this subset. We use an analogous notation for a function defined on the edge set E of an undirected

graph or a vector whose entries corresponds to the elements of A or E.

For a given subset of vertices U ⊆ V , we also define

δ+(U) := a= (u, v)∈A : u∈U,v /∈U,

δ−(U) := δ+(V \U)

A(U) := (u, v)∈A : u∈U,v ∈U,

to be the set of arcs that are, respectively, leaving, entering, and contained in U . Also, with a slight

abuse of the notation, for a vertex v ∈ V we sometimes use δ+(v) and δ−(v) instead of δ+(v) and

δ−(v), respectively. Similarly, for an undirected graph, δ(U) denotes the set of edges with exactly

one endpoint in U , and E(U) denotes the set of edges with both endpoints in U .

Finally, for the rest of this paper, we use log to denote the natural logarithm.


3. The Held-Karp Relaxation

We start by presenting the Held-Karp relaxation [HK70] for the asymmetric traveling salesman

problem. In this relaxation, given an instance of ATSP with cost function c : V ×V →R+, we consider

the following linear program defined on the complete digraph over the vertex set V :

min∑a

c(a)xa (3.1)

s.t. x (δ+(U))≥ 1 ∀U ⊂ V and U 6= ∅, (3.2)

x (δ+(v)) = x (δ−(v)) = 1 ∀v ∈ V, (3.3)

xa ≥ 0 ∀a.

As shown in [HK70], the cost OPTHK := c(x ∗) of the optimal solution x ∗ for the preceding linear

program is a lower bound on the cost OPT of the optimal solution to the input instance of ATSP. To

observe this, note that every subset of arcs (e.g., every tour) can be represented by assigning binary

values to xa-s. More specifically, for every arc a, the value of xa can be set to 1 if a is included in the

tour and 0 otherwise. Note that the cost of the tour will be the corresponding value of the objective

function for this solution. Also, for every tour that we are interested in (i.e., the tours that visits

every city exactly once and return to the original vertex), the corresponding solution satisfies (3.2)

(due to being connected) as well as (3.3) (due to entering at every city and exiting from it exactly

once).

Although the Held-Karp relaxation has exponential number of constraints (in terms of |V |), it is

well-known that an optimal solution x ∗ to it can be computed in polynomial-time (either by employ-

ing the Ellipsoid method or by reformulating it as an LP with polynomial number of constraints).

Furthermore, we can assume that x ∗ is an extreme point of the corresponding polytope.

Now, observe that (3.3) implies that any feasible solution x to the Held-Karp relaxation satisfies

x (δ+(U)) = x (δ−(U)), (3.4)

for any U ⊆ V . In other words, for any subset U ⊆ V of vertices, the (fractional) number of arcs

leaving U in x is equal to the (fractional) number of arcs entering it.


Our particular interest will be in a symmetrized and slightly scaled down version of x ∗. Namely,

let us define

z∗u,v :=n− 1

n(x∗uv +x∗vu). (3.5)

Let us also denote by A the support of x ∗, i.e., A= (u, v) : x∗uv > 0, and by E the support of

z ∗. For every edge e = u, v of E, we define its cost as minc(a) : a ∈ (u, v), (v,u) ∩A, which

corresponds to choosing the cheaper of possible orientations of that edge. We denote this new cost

of this edge e by c(e). This implies, in particular, that c(z ∗)≤ c(x ∗).

The main purpose of the scaling factor in (3.5) is to make z ∗ belong to the spanning tree polytope P

of the graph (V,E). This ensures that z ∗ can be viewed as a convex combination of incidence vectors

of spanning trees. In fact, as stated in Lemma 3.1, this makes z ∗ belong to the relative interior of P .

We will later use this fact in proving Theorem 5.1. We emphasize that E is defined as the support

of z ∗, i.e., E = (u, v) : z∗uv > 0. The proof of Lemma 3.1 can be found in the appendix.

Lemma 3.1. The vector z ∗ defined by (3.5) belongs to relint(P ), the relative interior of the spanning

tree polytope P of the graph (V,E).

Finally, we note that as x ∗ is an extreme point (i.e., basic feasible) solution, it is known that its

support A has at most 3n− 2 arcs [VY99]. (Also, see [Goe06] for a bound of 3n− 4.) In addition,

since x ∗ can be expressed as the unique solution of an invertible system with only 0–1 coefficients,

by using the result of [Had93], we have that every entry x∗a is rational with integral numerator and

denominator bounded by 2O(n logn). In particular, z∗min = mine∈E z∗e > 2−O(n logn).

Remark 3.1. All our results still hold if we take any optimal solution x ∗ (not necessary an extreme

point) where the denominators of all the entries of x∗a are bounded by 2nO(1)

. However, for the

rest of the paper we will focus on the aforementioned solution x ∗ and its corresponding z ∗ where

mine∈E z∗e > 2−O(n logn).


4. Thin Trees and the Asymmetric Traveling Salesman Problem

The key element of our approach is a connection between the approximability of the asymmetric

traveling salesman problem and the notion of thin trees. To describe this connection, let us first

formally define thin trees.

Definition 4.1. Given a point z in a spanning tree polytope of the graph (V,E), and an α ≥ 1,

we say that a spanning tree T is α-thin with respect to z iff, for each subset U ⊂ V of vertices,

|E(T )∩ δ(U)| ≤ α · z (δ(U)). Also, we say that T is (α,s)-thin with respect to z iff it is α-thin and

moreover c(T )≤ s · c(z ), where c(T ) :=∑u,v∈T minc((u, v)), c((v,u)) denotes the cost of T after

directing each of its edges according to the orientation that yields smaller cost.

Note that the requirement of α-thinness only concerns the graph structure and not the weight

of the edges. Now, let us consider x ∗ to be an optimal solution to the Held-Karp relaxation, as

described in Section 3, and z ∗ to be the symmetrized and scaled down version of x ∗ defined in (3.5).

By Lemma 3.1, we know that z ∗ belongs to the (relative interior of) the corresponding spanning tree

polytope. The crucial observation that we make is that finding an (α,s)-thin tree with respect to

z ∗, for some α and s, translates directly to obtaining a (2α+ s)-approximation for the asymmetric

traveling salesman problem.

Theorem 4.1. Let x ∗ be an optimal solution to the Held-Karp relaxation and z ∗ be the correspond-

ing point in the spanning tree polytope as defined in (3.5). Given T ∗, an (α,s)-thin spanning tree

with respect to z ∗ for some α and s, we can find in polynomial-time an Eulerian tour whose cost is

at most (2α+ s)c(x ∗) = (2α+ s)OPTHK ≤ (2α+ s)OPT.

Our proof for Theorem 4.1 relies on a certain classical result on flow circulations. To state this

result, let us recall that a circulation is any function f : A→R such that f(δ+(v)) = f(δ−(v)) for each

vertex v ∈ V . The following theorem (see, e.g., [Sch03, Theorem 11.2] for a proof) gives a necessary

and sufficient condition for the existence of a circulation subject to the lower and upper bounds (or

capacities) on arcs.


Theorem 4.2 (Hoffman’s circulation theorem). Given lower and upper capacities l, u : A→R,

there exists a circulation f satisfying l(a)≤ f(a)≤ u(a) for all a∈A iff

1. l(a)≤ u(a) for all a∈A and

2. l(δ−(U))≤ u(δ+(U)), for all subsets U ⊆ V .

Furthermore, if l and u are integer-valued, f can be chosen to be integer-valued too.

We proceed now to the proof of Theorem 4.1.

Proof of Theorem 4.1: Let us first orient each edge u, v of T ∗ according to arg minc(a) :

a ∈ (u, v), (v,u) ∩A, and denote the resulting directed tree by ~T ∗. Observe that by definition of

our undirected cost function, we have c(~T ∗) = c(T ∗).

Let us now consider a minimum cost augmentation of ~T ∗ into an Eulerian directed (multi)graph.

Finding such an augmentation can be formulated as a minimum cost circulation problem with integral

lower capacities and infinite upper capacities. One just needs to set

l(a) :=

1 a∈ ~T ∗

0 a /∈ ~T ∗

and consider the minimum cost circulation problem

minc(f) : f is a circulation and f(a)≥ l(a) ∀a∈A. (4.1)

It is well-known (see, e.g., [Sch03, Corollary 12.2a]) that an optimum solution f∗ to Problem (4.1)

is polynomial-time computable and can be assumed to be integral, since l is integral. This integral

circulation f∗ can be viewed as a directed (multi)graph H that contains ~T ∗ and is Eulerian, i.e.,

every vertex has in-degree equal to its out-degree in H.

Hence, as H is also weakly connected (since it contains ~T ∗ as its subgraph), we can take an Eulerian

walk of H and shortcut it to obtain a Hamiltonian cycle of cost at most c(f∗). (We are using here

the fact that the costs satisfy the triangle inequality.)

To complete the proof, it remains to bound the cost of f∗. That is, our goal is to show that

c(f∗)≤ (2α+ s)c(x ∗). To this end, let us define

u(a) :=

1 + 2αx∗a a∈ ~T ∗

2αx∗a a /∈ ~T ∗.(4.2)


We claim that there exists a circulation g satisfying l(a) ≤ g(a) ≤ u(a) for every a ∈ A. Note that

this claim would imply that

c(f∗)≤ c(g)≤ c(u) = c(~T ∗) + 2αc(x ∗)≤ (2α+ s)c(x ∗).

Since g is a feasible solution to (4.1), we have c(f∗)≤ c(g), which establishes the desired bound on

the cost of f∗.

In the light of this, it only remains to establish the claim. To this end, observe that the α-thinness

of T ∗ implies that, for any subset U ⊆ V of vertices, the number of arcs of ~T ∗ in δ−(U) is at most

αz ∗(δ(U)), regardless of how T ∗ is oriented to achieve ~T ∗. As a consequence,

l(δ−(U))≤ αz ∗(δ(U))≤ 2αx ∗(δ−(U)).

Note that the second inequality holds because (3.5) implies that z ∗(δ(U)) = n−1n

[x ∗(δ−(U)) +

x ∗(δ+(U))]. However, x ∗ itself is a circulation due to (3.4) and hence, x ∗(δ−(U)) = x ∗(δ+(U)).

On the other hand, due to the definition of u(·) in (4.2), we have u(δ+(U))≥ 2αx ∗(δ+(U)). Thus,

u(δ+(U))≥ 2αx ∗(δ+(U)) = 2αx ∗(δ−(U))≥ l(δ−(U)).

Therefore, we can conclude that

l(δ−(U))≤ u(δ+(U)),

for any U ⊆ V and thus, by Theorem 4.2, the circulation g indeed exists. This concludes the proof

of the theorem.

5. Maximum Entropy Sampling and Concentration Bounds

In the light of the connection established in the previous section (see Theorem 4.1), our goal is to

develop a way of producing a tree that is sufficiently thin with respect to the point z ∗ (see (3.5)).

We will achieve that by devising an efficient procedure for sampling from an appropriate probability

distribution over spanning trees.

To this end, recall that by Lemma 3.1 we know that z ∗ belongs to the relative interior of the

spanning tree polytope P that corresponds to the optimal solution x ∗ to our LP relaxation from Sec-

tion 3. This means that not only that z ∗ can be expressed as a convex combination of spanning trees


on the support E of z ∗, but also it can be done in such a way that all coefficients are positive. In

fact, there can be many ways to express z ∗ in this manner.

Furthermore, observe that each such convex combination naturally defines a distribution over

spanning trees that preserves the marginal probabilities imposed by z ∗, i.e., it is the case that

PrT [e ∈ T ] = z∗e , for every edge e ∈ E. It is, therefore, tempting to use such a distribution for our

thin tree sampling procedure. (After all, one can interpret the thinness condition – cf. Definition 4.1

– as a statement about approximate preservation of the respective cost- and cut-based marginals.)

However, as we already mentioned, there are many possible ways of representing z ∗ as a probability

distribution. Which one should we choose? As it turns out, a good choice is to take the distribution

that maximizes the entropy among all such marginal-preserving distributions. We formalize this

notion as well as derive some of its crucial properties in the following section.

5.1. Maximum Entropy Distribution

Let T be the collection of all the spanning trees of G= (V,E) and let z be an arbitrary point in the

corresponding spanning tree polytope P of G. The maximum entropy distribution p∗(·) with respect

to the marginal probabilities imposed by z is the optimum solution of the following convex program.

inf∑T∈T

p(T ) log p(T ) (5.1)

s.t.∑T3e

p(T ) = ze ∀e∈E,

p(T )≥ 0 ∀T ∈ T .

It is not hard to see that this convex program is feasible since z belongs to the spanning tree polytope

P . Note that since the support T of the distribution is finite (in fact, |T | ≤ nn−2, due to Cayley’s

formula [Cay89]), the objective function is bounded by at most log |T | ≤ (n− 2) logn. Also, note

that the feasible region is compact (closed and bounded). Hence, the infimum is attained and there

exists an optimum solution p∗(·). Furthermore, since the objective function is strictly convex, this

maximum entropy distribution p∗(·) is unique. Let OPTEnt denote the optimum value of this convex

program CP (5.1).


The value p∗(T ) determines the probability of sampling any tree T in the maximum entropy

rounding scheme. Note that it is implicit in the constraints of this convex program that, for any

feasible solution p(·), we have∑

T p(T ) = 1 since

n− 1 =∑e∈E

ze =∑e∈E

∑T3e

p(T ) = (n− 1)∑T

p(T ).

We now want to show that, if we assume that the vector z is in the relative interior of the spanning

tree polytope of (V,E) then p∗(T )> 0 for every T ∈ T and p∗(T ) can be written as a product-form

distribution. (Observe that our vector z ∗ indeed satisfies this assumption.)

For this purpose, we write the Lagrange dual to the convex program CP (5.1) (see, e.g., [BTN01]

for the relevant background). For every e∈E, we associate a Lagrange multiplier δe to the constraint

corresponding to the marginal ze, and define the Lagrange function as

L(p, δ) =∑T∈T

p(T ) log p(T )−∑e∈E

δe

(∑T3e

p(T )− ze

).

We can then rewrite it as

L(p, δ) =∑e∈E

δeze +∑T∈T

(p(T ) log p(T )− p(T )

∑e∈T

δe

).

The Lagrange dual to CP (5.1) is now

supδ

infp≥0

L(p, δ). (5.2)

The inner infimum in this dual is easy to solve. Namely, as the contributions of each p(T ) are

separable, we have that, for every T ∈ T , p(T ) must minimize the convex function

p(T ) log p(T )− p(T )δ(T ),

where, as usual, δ(T ) =∑

e∈T δe. Taking partial derivatives with respect to p(T ), we derive that

1 + log p(T )− δ(T ) = 0, or

p(T ) = eδ(T )−1. (5.3)

Thus,

infp≥0

L(p, δ) =∑e∈E

δeze−∑T∈T

eδ(T )−1.


Using the change of variables γe = δe− 1n−1

for e∈E, the Lagrange dual (5.2) can thus be rewritten

as

supγγγ

[1 +

∑e∈E

zeγe−∑T∈T

eγγγ(T )

]. (5.4)

Now, our assumption that the vector z is in the relative interior of the spanning tree polytope

translates to satisfying the Slater condition for the primal program (5.1) [BV06]. This, together

with convexity, implies that the supremum in (5.4) is attained by some vector γγγ∗, and the Lagrange

dual value equals the optimum value OPTEnt of our convex program. Furthermore, we have that the

(unique) primal optimum solution p∗ and any dual optimum solution γγγ∗ must satisfy

L(p,γγγ∗)≥L(p∗,γγγ∗)≥L(p∗,γγγ), (5.5)

for any p≥ 0 and any γγγ, where we have implicitly redefined L due to our change of variables from δ

to γγγ. Therefore, p∗ is the unique minimizer of L(p,γγγ∗) and from (5.3), we have that

p∗(T ) = eγγγ∗(T ). (5.6)

In summary, the following theorem holds.

Theorem 5.1. Given a vector z in the relative interior of the spanning tree polytope P of G =

(V,E), there exist γ∗e , for all e ∈ E, such that if we sample a spanning tree T of G according to

p∗(T ) := exp(γγγ∗(T )), then Pr[e∈ T ] = ze for every e∈E.

It is worth noting that the requirement that z is in the relative interior of the spanning tree

polytope (as opposed to being just in this polytope) is crucial. (This has been already observed

before, see Exercise 4.19 in [LP14]). To see that, consider G being a triangle and z being the vector

( 12, 1

2,1). In this case, one can verify that z is in the polytope (but not in its relative interior) and

there are no γ∗e -s that would satisfy the statement of the theorem. (However, one can get arbitrarily

close to ze for all e∈E.)

In Sections 7 and 8 we show how to efficiently find γγγ-s that approximately satisfy the conditions of

Theorem 5.1. The method proposed in Section 7 is combinatorial, as opposed to the Ellipsoid-based


method of Section 8. However, as we will see, the running time of the former grows polynomially

with the inverse of the desired error, while for the latter the running time grows polylogarithmically.

The results of Sections 7 and 8 can be formalized as the following theorem whose result we use in

the rest of the paper. For our application to the asymmetric traveling salesman problem, we set zmin

to be 2−O(n logn) and ε to be 15.

Theorem 5.2. Given z in the spanning tree polytope of G = (V,E) and some ε > 0, there is an

algorithm that finds values γe for all e∈E so that for the product-form distribution p defined by

p(T ) :=exp(

∑e∈T γe)∑

T∈T exp(∑

e∈T γe)∀T ∈ T ,

for every edge e∈E, we have

ze :=∑

T∈T :T3e

p(T )≤ (1 + ε)ze.

Furthermore, the running time of the algorithm is polynomial in n= |V |, − log zmin and 1/ε. (For

the same theorem with running time polynomial in n= |V |, − log zmin and log(1/ε), see Section 8.)

Note that Theorem 5.2 does not require z to be in the relative interior of the spanning tree

polytope.

5.2. Maximum Entropy Distribution and λλλ-Random Trees

Interestingly, the distributions over spanning trees considered in Theorem 5.2 turn out to be closely

related to the notion of λλλ-random (spanning) trees. Given λe ≥ 0 for e∈E, a λλλ-random tree T of G

is a tree T chosen from the set of all spanning trees of G with probability proportional to∏e∈T λe.

The notion of λλλ-random trees has been extensively studied (see, e.g., Ch.4 of [LP14]) – note that

in case of all λe-s being equal, a λλλ-random tree is just a uniform spanning tree of G. Many of the

results for uniform spanning trees carry over to λλλ-random spanning trees in a graph G; since, for

rational λe-s, a λλλ-random spanning tree in G corresponds to a uniform spanning tree in a multigraph

obtained from G by letting the multiplicity of edge e be proportional to λe.

Furthermore, observe that a tree T sampled from a probability distribution p(·) as given in The-

orem 5.2 is λλλ-random for λe := exp(γe) for all e ∈ E. As a result, we can use the tools developed


for λλλ-random trees to obtain an efficient sampling procedure, see Section 5.3, and to derive sharp

concentration bounds for the distribution p(·), see Section 6.

5.3. Sampling a λλλ-Random Tree

There is a host of results (see [Gue83, Kul90, CMN96, Ald90, Bro89, Wil96, KM09] and references

therein) on obtaining polynomial-time algorithms for generating a uniform spanning tree, i.e., a λλλ-

random tree for the case of all λe-s being equal. Almost all of them can be easily modified to allow

arbitrary rational values of λe. However, many of those algorithm do not guarantee a polynomial

running time for our situation, where λe-s could have different values. Therefore, we resort to an

iterative approach similar to the one of Kulkarni [Kul90] that runs in polynomial-time for arbitrary

rational values of λe. We note that “polynomial-time” here means that the algorithm takes a poly-

nomial number of bit operations in terms of the size of the input (which translates to the number of

bits required to represent the vector λλλ).

The basic idea of this approach is to order the edges e1, . . . , em of G arbitrarily and process them

one by one, deciding probabilistically whether to add a given edge to the final tree or to discard it.

To be precise, we define Dj := 1ej∈T as the decision we make for edge ej . We include the first edge,

e1, to the tree with probability p1 = ze1 . When we process the j-th edge ej for j > 1, we decide to add

it to the final spanning tree T with probability pj(D1, · · · ,Dj−1), being the probability that ej is in a

λλλ-random tree conditioned on the decisions that were made for edges e1, . . . , ej−1 in earlier iterations.

In other words, pj(D1, · · · ,Dj−1) = Pr[ej ∈ T |D1, · · · ,Dj−1]. Clearly, this procedure generates a λλλ-

random tree, and its running time is polynomial as long as the computation of the probabilities pj

can be done in polynomial time.

To compute these probabilities efficiently we note that, by definition, p1 = ze1 . Now, if we choose

to include e1 in the tree (i.e., D1 = 1), then:

p2(1) = Pr[e2 ∈ T |e1 ∈ T ] =

∑T ′3e1,e2

∏e∈T ′ λe∑

T ′3e1

∏e∈T ′ λe

=

∑T ′3e1,e2

∏e∈T ′\e1 λe∑

T ′3e1

∏e∈T ′\e1 λe

.


As one can see, the probability that e2 ∈ T conditioned on the event that e1 ∈ T is equal to the

probability that e2 is in a λλλ-random tree of a graph obtained from G by contracting the edge e1.

Similarly, if we choose to discard e1, the probability p2 is equal to the probability that e2 is in a

λλλ-random tree of a graph obtained from G by removing e1. In general, pj(D1, · · · ,Dj−1) is equal to

the probability that ej is included in a λλλ-random tree of a graph G′ obtained from G by contracting

all edges that we have already decided to add to the tree, and deleting all edges that we have already

decided to discard. In other words, G′ = (G − ei : Di = 0)/ei : Di = 1, where the symbol “/”

stands for the contraction operation..

Therefore, to obtain each pj we just need to be able to compute efficiently for a given (multi)graph

G′ and values of λe-s, the probability pG′ [λλλ,f ] that some edge f is in a λλλ-random tree of G′. It is well-

known how to perform such computation. For this purpose, one can evaluate∑

T∈T∏e∈T λe for both

G′ and G′/f (in which edge f is contracted) using Kirchhoff’s matrix tree theorem (see [Bol02]).

Theorem 5.3 (Kirchhoff Matrix Tree Theorem [Kir47]). Let G = (V,E) be an undirected

graph with given edge weights wi,j for all i, j ∈G, and let T be the set of all spanning trees of G.

The value∑

T∈T∏i,j∈T wi,j for any graph G is equal to the absolute value of any cofactor of the

weighted Laplacian L where

Li,j =

−λe e= (i, j)∈E∑

e∈δ(i) λe i= j

0 otherwise.

An alternative approach to computing pG′ [λλλ,f ] is to use the fact (see, e.g., Ch. 4 of [LP14]) that

pG′ [λλλ,f ] is equal to λf times the effective resistance of f in G′ treated as an electrical circuit with

conductances of edges given by λλλ. The effective resistance can be expressed by an explicit linear-

algebraic formula whose computation boils down to inverting a certain matrix that can be easily

derived from the Laplacian of G′ (see, e.g., Section 2.4 of [GBS08] for details).


5.4. Negative Correlation and Concentration Bounds

We now derive a concentration bound that will be instrumental in establishing the thinness of our

sampled tree. The following theorem states that for any arbitrary subset C of edges in the graph,

the number of edges in C that are included in a sampled λλλ-random tree is with high probability

concentrated around its expected value.

Theorem 5.4. For each edge e, let Xe be an indicator random variable associated with the event

[e ∈ T ], where T is a sampled λλλ-random tree. Also, for any subset C of the edges of G, define

X (C) =∑

e∈CXe. Then we have

Pr[X (C)≥ (1 + δ)E[X (C)]]≤(

eδ

(1 + δ)1+δ

)E[X (C)]

.

Usually, when we want to obtain such concentration bounds, we prove that the variables XeE

are independent and use the Chernoff bound. Unfortunately, in our case, the variables XeE are not

independent. However, it is well-known that since our distribution is in product-form, they are neg-

atively correlated, i.e., for any subset F ⊆E, Pr[∀e∈FXe = 1]≤∏e∈F Pr[Xe = 1], see, e.g., Chapter 4

of [LP14].

Lemma 5.1. The random variables XeE are negatively correlated.

Before proceeding further, we should note that Lyons and Peres prove this fact only in the case

of T being a uniform spanning tree, i.e., when all λe-s are equal. However, as mentioned before, for

rational λe-s it is enough to replace each edge e with Cλe copies of edge e (for an appropriate choice

of C to obtain all integral numbers, such as the lowest common denominator of all λe-s) and consider

a uniform spanning tree in the corresponding multigraph. This will make the probability of each tree

T proportional to∏e∈T Cλe =Cn−1

∏e∈T λe, which is proportional to

∏e∈T λe. The irrational case

follows from a straightforward limit argument where C→∞.

Now that we have established negative correlation between the Xe-s, Theorem 5.4 follows directly

from the result of the Theorem 3.4 in Panconesi and Srinivasan [PS97] that the upper tail part of


the Chernoff bound requires only negative correlation (or even a weaker notion, see [PS97]) and not

the full independence of the random variables.

Finally, it is worth to point out that, since the initial publication of our work [AGM+10], another

way of producing negatively correlated marginal-preserving probability distributions over spanning

trees (or, more generally, on matroid bases) has been proposed [CVZ10]. This approach can also be

used in the framework developed in this paper.

6. Establishing the Thinness of the Sampled Tree

In this section, we focus on the product-form distribution p(·) that we obtain by applying the algo-

rithm of Theorem 5.2 to z ∗. We show that the tree sampled from the distribution p(·) is thin with

high probability. Refer to Theorem 6.1 for the formal statement.

The following lemma relates the number of edges of the sampled tree across a particular cut to the

sum of the fractions of the edges across that cut in the fractional solution. The proof can be found in

the appendix. In the rest of this paper and for the sake of simplicity, we define β = 4 logn/ log logn.

Lemma 6.1. Given z in the spanning tree polytope of G = (V,E) for n = |V | ≥ 5, let p(·) be the

probability distribution obtained by applying Theorem 5.2 on z for ε≤ 0.2. Then for any set U ⊂ V ,

Pr[|T ∩ δ(U)|>βz (δ(U))]≤ n−2.5z (δ(U)),

where T is a spanning tree sampled from p(·).

Now, we are ready to combine this concentration result with Karger’s bound on the number of

near-minimum cuts in a graph [Kar93] in order to establish the desired thinness of our sampled tree.

Theorem 6.1. Consider an instance of ATSP defined by the cost function c : V ×V →R+ for n=

|V | ≥ 5, and let x ∗ be the optimum solution of its Held-Karp linear programming relaxation. Also,

let z ∗ be a point in the spanning tree polytope of the underlying undirected graph G= (V,E) derived

from x ∗ by (3.5). Let T1, . . . , Td2 logne be d2 logne independent samples from a distribution p(·) as

given by Theorem 5.2 applied on z ∗ for ε= 0.2. Finally, let T ∗ be the tree among these samples that

minimizes the cost c(Tj), for 1≤ j ≤ d2 logne. Then, with probability at least 1− 2n−1

, the spanning

tree T ∗ is (4 logn/ log logn,2)-thin with respect to z ∗.


Proof: We start by showing that for any 1≤ j ≤ d2 logne, Tj is β-thin with high probability

for β = 4 logn/ log logn. From Lemma 6.1 we know that the probability of some particular cut

δ(U) violating the β-thinness of Tj is at most n−2.5z∗(δ(U)). Now, we use a result of Karger [Kar93]

that shows that there are at most n2l cuts of size at most l times the minimum cut value for any

half-integer l ≥ 1. Since, by the definitions of the Held-Karp relaxation and of z ∗, we know that

z ∗(δ(U))≥ 2(1− 1/n), it means there are at most nl cuts δ(U) with z ∗(δ(U))≤ l(1− 1/n) for any

integer l ≥ 2. Therefore, by applying the union bound (and n ≥ 5), we derive that the probability

that there exists some cut δ(U) with |Tj ∩ δ(U)|>βz ∗(δ(U)) is at most

∞∑i=3

nin−2.5(i−1)(1−1/n),

where each term is an upper bound on the probability that there exists a violating cut of size within

[(i− 1)(1− 1/n), i(1− 1/n)]. For n≥ 5, this simplifies to:

∞∑i=3

nin−2.5(i−1)(1−1/n) ≤∞∑i=3

n−i+2 =1

n− 1,

Thus, the probability that there exists a cut for which the β-thinness property is violated by T ∗ is

at most 1− 1n−1

.

Now, the expected cost of Tj is

E[c(Tj)]≤∑e∈E

ze ≤ (1 + ε)n− 1

n

∑a∈A

x∗a ≤ (1 + ε)OPTHK.

So, by the Markov inequality we have that for any j, the probability that c(Tj)> 2OPTHK is at most

(1 + ε)/2. Thus, with probability at most ( 1+ε2

)2 logn < 1n

for ε= 0.2, we have c(T ∗)> 2OPTHK.

Now, taking a union bound for the events of the violation of β-thinness in some cut and the event

of the cost being more than 2OPTHK, we conclude that the probability that T ∗ is not (β,2)-thin is

at most 1− 2n−1

. This completes the proof of the theorem.

We note that the preceding proof establishes a thinness probability of 1−O(1/nk), for any k > 1,

by increasing the value of β by a factor of k.

After proving the thinness result of Theorem 6.1, we can put it together with the developments

of Section 4 to establish the main result of the paper.


Theorem 6.2. Algorithm 1 finds a (2 + 8 logn/ log logn)-approximate solution to the Asymmetric

Traveling Salesman Problem with probability at least 1− 2n−1

in time that is polynomial in the size of

the input.

Proof: The algorithm starts by finding an optimal extreme-point solution x ∗ to the Held-Karp

LP relaxation of ATSP of value OPTHK. Next, using the algorithm of Theorem 5.2 on z ∗ (which

is defined by (3.5)) with ε = 0.2, we obtain γe-s that define the probability distribution p(T ) :=

exp(∑

e∈T γe). Since x ∗ was an extreme point, we know that z∗min ≥ e−O(n logn); thus, the algorithm

of Theorem 5.2 indeed runs in polynomial time.

Next, we use the polynomial-time sampling procedure of Subsection 5.3 to sample 2dlogne trees

Tj from the distribution p(·), and take T ∗ to be the one among them that minimizes the cost c(Tj).

By Theorem 6.1, we know that T ∗ is (4 logn/ log logn,2)-thin with probability at least 1− 2n−1

.

Finally, we use Theorem 4.1 to obtain, in polynomial time, a (2 + 8 logn/ log logn)-approximation

of our ATSP instance.

The proof also shows that the integrality gap of the Held-Karp relaxation for the asymmetric TSP

is bounded above by 2 + 8 logn/ log logn. The best known lower bound on the integrality gap is only

2, as shown in [CGK06]. Closing this gap is a challenging open question.

Corollary 6.1. Showing the existence of a (C1,C2)-thin spanning tree, where C1 and C2 are uni-

versal constants, would establish a constant integrality gap for the Held-Karp LP relaxation of the

asymmetric TSP. Moreover, an efficient algorithm to find such spanning trees would lead to an effi-

cient constant factor approximation algorithm for the asymmetric TSP.

We believe finding such thin trees could be the key to improve the approximation factor for ATSP

even further. Indeed, since the publication of an earlier version of this paper [AGM+10], thin trees

were used by [OS11] to develop constant-factor approximation algorithms for ATSP on planar graphs

and graphs with bounded genus and by [AO15] for a polyloglog(n) bound on the integrality gap of the

Held-Karp relaxation. The former shows the existence of constant thin trees for planar graphs and

graphs with bounded genus, and the latter shows the existence of trees with thinness of a bounded


degree polynomial of log log(n) for all graphs but stops short of giving a polynomial time algorithm

for finding them. We conjecture that the constant thin trees exists, they can be found in polynomial

time, and consequently, constant factor approximation algorithms for the ATSP can be found.

Conjecture 6.1. There exists universal constant values of C1,C2 > 0 such that for any feasible

solution of the Held-Karp relaxation of the asymmetric TSP, there exists a (C1,C2)-thin spanning

tree and it can be found in polynomial time.

We refer the reader to [OS11, Corollary 5.1] for a slightly more general variant of the conjecture

that does not depend on the cost of the arcs.

Conversely, one can approach a lower-bound on the integrality gap of the Held-Karp relaxation by

studying families of feasible solutions that do not admit thin trees. Goddyn [God] showed a direct

relationship between thin trees and the Jaeger’s conjecture [Jae84] on the existence of nowhere-zero

flows (Jaeger’s conjecture has been recently proved by Thomassen [Tho12]). Currently, Conjecture 6.1

has only been proved for planar and bounded genus graphs [OS11] (see also [ES13, HO14]).

7. Solving the Maximum Entropy Convex Program: A Combinatorial Approach

In this section, we provide a combinatorial algorithm to efficiently find γe-s that approximately

preserve the marginal probabilities given by z and therefore, prove Theorem 5.2. As an alternative,

in Section 8 we show that the maximum entropy convex program can also be solved via the Ellipsoid

method. The advantage of the latter approach is that the resulting running-time dependence on ε is

only polylogarithmic.

Given a vector γγγ, for each edge e, define

qe(γγγ) :=

∑T3e eγγγ(T )∑T eγγγ(T )

, (7.1)

where γγγ(T ) =∑

f∈T γγγf . For notational convenience, we have dropped the fact that T ∈ T in these

summations; this shouldn’t lead to any confusion. Restated, qe(γγγ) is the probability that edge e will

be included in a spanning tree T that is chosen with probability proportional to exp(γγγ(T )). We will

discuss the computational issues of deriving the values of qe(γγγ)-s at the end of this section.


We compute γγγ using the following simple algorithm. Starting with all γe-s equal to each other, as

long as the marginal qe(γγγ) for some edge e is more than (1 + ε)ze, we decrease γe appropriately to

decrease qe(γγγ) to (1 + ε/2)ze. A formal description can be found in Algorithm 2.

Algorithm 2 A Combinatorial Algorithm to Find γγγ as Defined in Theorem 5.2Input: The vector of marginal probabilities z .

Set γγγ =~0.

while there exists an edge e with qe(γγγ)> (1 + ε)ze do

Compute δ such that if we define γγγ′ as γ′e = γe− δ, and γ′f = γf for all f ∈E \ e,

then qe(γγγ′) = (1 + ε/2)ze.

Set γγγ←γγγ′.

end while

return γγγ :=γγγ.

Clearly, if the described procedure terminates then the resulting γγγ satisfies the requirement of

Theorem 5.2. Therefore, what we need to show is that this algorithm terminates in time polynomial

in n, − log zmin and 1/ε, and that each iteration can be implemented in polynomial time.

We start by bounding the number of iterations - we will show that it is O( 1ε|E|2[|V | log(|V |)−

log(εzmin)]). The next lemma derives an equation for δ and proves that decreasing γe does not

decrease qf (·) for any f 6= e. We refer the reader to the appendix for the proof.

Lemma 7.1. Fix a δ ≥ 0 and an edge e ∈ E. Define γγγ′ by γ′e := γe − δ and γ′f =: γf for all f 6= e.

Then,

1. for all f ∈E \ e, qf (γγγ′)≥ qf (γγγ),

2. qe(γγγ′) satisfies 1qe(γγγ′)

− 1 = eδ(

1qe(γγγ)− 1)

.

In particular, in the main loop of the algorithm, since qe(γγγ) > (1 + ε)ze and we want qe(γγγ′) =

(1 + ε/2)ze, we get δ = log qe(γγγ)(1−(1+ε/2)ze)

(1−qe(γγγ))(1+ε/2)ze> log (1+ε)

(1+ε/2)> ε

4for ε≤ 1. For larger values of ε, we can

simply decrease ε to 1.

We continue with some basic definitions and results regarding spanning trees and forests.


Definition 7.1. A subgraph H of a given undirected graph G is called a spanning forest of G if it

consists of a spanning subtree in each connected component of G.

In other words, a spanning forest of G is a maximal acyclic subgraph of G. Now, consider some subset

Q of the edges of G. Let r be the size of a maximum spanning forest of Q. By the definition of r, it

is clear that |T ∩Q| ≤ r, for any spanning tree T of G. We distinguish two types of spanning trees in

G by defining T (r)= := T ∈ T : |T ∩Q|= r and T (r)

< := T ∈ T : |T ∩Q|< r. The following lemma

establishes some structural properties of the spanning trees of G. Note that for the sake of simplicity,

we may get rid of the superscript (r), and work with T= and T< instead. The value of r will be clear

from the context.

Lemma 7.2. Let G = (V,E) be a graph with weights γe for e ∈ E. Let Q ⊂ E be such that for all

f ∈Q, e∈E \Q, we have γf >γe + ∆ for some ∆≥ 0.

1. Any given spanning tree T ∈ T= can be generated by taking the union of a spanning forest F

(of cardinality r) of the graph (V,Q) and a spanning tree (of cardinality n− r− 1) of the graph

G/Q in which the edges of Q have been contracted.

2. Let Tmax be a maximum weight spanning tree of G with respect to the weights γγγ, i.e., Tmax =

arg maxT∈T γγγ(T ). Then, for any T ∈ T<, we have γγγ(T )<γγγ(Tmax)−∆.

Proof: The first property follows from the matroidal properties of spanning trees. To prove the

second property, consider any T ∈ T<. Since |T ∩Q|< r, there exists an edge f ∈ (Tmax∩Q)\T such

that (T ∩Q)∪f is a forest of G. Thus, the unique circuit in T ∪f contains an edge e /∈Q. Hence,

T ′ = T ∪f \ e is a spanning tree. Our assumption on Q implies that

γ(Tmax)≥ γ(T ′) = γ(T )− γe + γf >γ(T ) + ∆,

which yields the desired inequality.

We proceed to bounding the number of iterations.

Lemma 7.3. Algorithm 2 executes at most O( 1ε|E|2[|V | log(|V |)− log(εzmin)]) iterations of the main

loop.


Proof: Let n = |V | and m = |E|. Assume for the sake of contradiction that the algorithm

executes more than

τ :=4

εm2[n logn− log(εzmin)]

iterations. Let γγγ be the vector of γe-s computed at iteration τ+1. For brevity, let us define qe := qe(γγγ)

for all edges e. We first provide an upper bound for the minimum entry of γγγ.

Claim 7.1. There exists some e∗ ∈E such that γe∗ <− ετ4m

.

Proof: Note that there are m edges in the graph, and Lemma 7.1 implies that each “while”

iteration of Algorithm 2 decreases γe of one of these edges by δ, which is at least ε/4 (refer to the

discussion after the statement of Lemma 7.1). Moreover, all entries of γγγ are initially set to 0. Thus,

we know that, after more than τ iterations, there exists e∗ for which γe∗ is as desired.

Now, we provide a lower bound for the maximum entry of γγγ.

Claim 7.2. There exists some f∗ ∈E such that γf∗ = 0.

Proof: Note that Algorithm 2 never decreases γe for edges e with qe(·) smaller than (1 + ε)ze.

Moreover, Lemma 7.1 shows that reducing γf of edge f 6= e can only increase qe(·). Therefore, all

the edges with negative value of γe must satisfy qe ≥ (1 + ε/2)ze. In other words, all edges e such

that qe < (1 + ε/2)ze satisfy γe = 0. However, a simple averaging argument establishes that∑

e qe =∑e

∑T3e exp(γγγ(T ))∑T exp(γγγ(T ))

=∑T∑e∈T exp(γγγ(T ))∑T exp(γγγ(T ))

= n− 1. Consequently,∑

e qe = n− 1< (1 + ε/2)(n− 1) = (1 +

ε/2)∑

e ze. Hence, there exists at least one edge f∗ with qf∗ < (1+ε/2)zf∗ and thus, having γf∗ = 0.

Note that as a result of Claims 7.1 and 7.2, the difference between the highest and the lowest entry

of λ is at least ετ4m

. This can be used to prove the existence of a gap in the values of the entries of λ.

The following claim establishes this result.

Claim 7.3. There exists ∅ 6=Q(E such that for all e∈E \Q and f ∈Q, γe + ετ4m2 <γf .

Proof: We construct Q as follows. We set threshold values Γi = − ετi4m2 , for i ≥ 0, and define

Qi = e ∈E |γe ≥ Γi. Let Q=Qj where j is the first index such that Qj =Qj+1. First, note that


due to the finiteness of E such index j always exists (and is at most d−4m2

ετmine∈E γee). Moreover,

we have Q⊂ E. However, it is clear from the construction of Q that the gap property is satisfied,

i.e., for all e ∈ E \Q and f ∈Q, γe + ετ4m2 < γf . Also, Q is non-empty since claim 7.2 implies that

f∗ ∈ Q0 ⊆ Qj = Q. Finally, by the pigeonhole principle, since we have m different edges, we know

that j <m. Thus, for each e∈Q we have γe > Γm =− ετ4m

. This means that e∗ /∈Q and thus, Q 6=E.

Observe that the subset Q corresponding to Claim 7.3 satisfies the hypothesis of Lemma 7.2 with

∆ = ετ4m2 . Thus, for any T ∈ T<, we have

γγγ(Tmax)>γγγ(T ) +ετ

4m2, (7.2)

where Tmax and r are as defined in Lemma 7.2.

Let G be the graph G/Q obtained by contracting all the edges in Q. So, G consists only of edges

not in Q (some of them can be self-loops). Let T be the set of all spanning trees of G , and for any

given edge e /∈Q, let qe :=∑T∈T ,T3e exp(γγγ(T ))∑T∈T exp(γγγ(T ))

be the probability that edge e is included in a random

spanning tree T of G , where each tree T is chosen with probability proportional to exp(γγγ(T )). Since

spanning trees of G have n− r− 1 edges, we have

∑e∈E\Q

qe = n− r− 1. (7.3)

Note that z , as the input to Algorithm 2, is a point in the spanning tree polytope. Hence, z satisfies

z (E) = n− 1 and z (Q)≤ r (by definition of r, see Lemma 7.2, part 1.). As a result, we have that

z (E \Q)≥ n− r− 1. Therefore, Eq. (7.3) implies that there must exist e′ /∈Q such that qe′ ≤ ze′ .

Our final step is to show that for any e /∈Q, qe < qe + εzmin2

. Note that once we establish this, we

know that qe′ < qe′+εzmin

2≤ (1+ ε

2)ze′ , and thus it must be the case that γe′ = 0. But this contradicts

the fact that e′ /∈Q, as by construction all e with γe = 0 must be in Q. Thus, we obtain a contradiction

that concludes the proof of the Lemma.

It remains to prove that for any e /∈Q, qe < qe + εzmin2

. We have that

qe =

∑T∈T :e∈T eγγγ(T )∑T∈T eγγγ(T )


=

∑T∈T=:e∈T eγγγ(T ) +

∑T∈T<:e∈T eγγγ(T )∑

T∈T eγγγ(T )

≤∑

T∈T=:e∈T eγγγ(T )∑T∈T= eγγγ(T )

+

∑T∈T<:e∈T eγγγ(T )∑

T∈T eγγγ(T )

≤∑

T∈T=:e∈T eγγγ(T )∑T∈T= eγγγ(T )

+∑

T∈T<:e∈T

eγγγ(T )

eγγγ(Tmax), (7.4)

where the first inequality following from replacing T with T= in the first denominator, and the second

inequality following from considering only one term (i.e., Tmax) in the second denominator. Using

(7.2) and the fact that the number of spanning trees is at most nn−2, the second term is bounded

by:

∑T∈T<:e∈T

eγγγ(T )

eγγγ(Tmax)≤ nn−2e−ετ/4m

2

(7.5)

<1

2nne−ετ/4m

2

=εzmin

2,

by definition of τ . To handle the first term of (7.4), we can use part 2. of Lemma 7.2 and factorize:

∑T∈T=

eγγγ(T ) =

∑T∈T

eγγγ(T )

(∑T ′∈F

eγγγ(T ′)

),

where F is the set of all spanning forests of (V,Q). Similarly, we can write

∑T∈T=,T3e

eγγγ(T ) =

∑T∈T ,T3e

eγγγ(T )

(∑T ′∈F

eγγγ(T ′)

).

As a result, we have that the first term of (7.4) reduces to:(∑T∈T ,T3e eγγγ(T )

)(∑T ′∈F eγγγ(T ′)

)(∑

T∈T eγγγ(T )) (∑

T ′∈F eγγγ(T ′)) =

∑T∈T ,T3e eγγγ(T )∑T∈T eγγγ(T )

= qe.

Together with (7.4) and (7.5), this gives

qe ≤ qe +εzmin

2,

which completes the proof.


To complete the analysis of the algorithm, we need to argue that each iteration can be imple-

mented in polynomial time. First, for any given vector γγγ, if we can calculate the exact values of

eγe (see Remark 7.1 below for a discussion) we can efficiently compute the sums∑

T exp(γγγ(T )) and∑T3e exp(γγγ(T )) for any edge e. This will enable us to compute all qe(γγγ)-s. This can be done using

Kirchhoff’s matrix tree theorem (see [Bol02]), as discussed in Section 5.3 for sampling λ-random

spanning trees, with λe defined to be exp(γe). Observe that we can bound all entries of the weighted

Laplacian matrix in terms of the input size since the proof of Lemma 7.3 actually shows that − ετ4|E| ≤

γe ≤ 0 for all e ∈ E and any iteration of the algorithm. Therefore, we can compute these cofactors

efficiently, in time polynomial in n, − log zmin and 1/ε. Finally, δ can be computed efficiently from

Lemma 7.1.

Remark 7.1. The terms eγe cannot be calculated exactly due to the irrationality issues. But any

of the terms eγe can be calculated within a factor (1 + ξ) of their actual value in polynomial time

in log(1/ξ). Hence, all qe(γγγ) values can be calculated efficiently within any given factor (1 + nξ) in

polynomial time in log(1/ξ). Now, in order to prove Theorem 4 for a given value ε it is enough to

calculate all values of qe-s with an error of ε/(3n) and run Algorithm 2 for ε/3. Since (1 + ε/3)2 ≤

(1 + ε) for ε ≤ 1/5, at the end of algorithm 1, all the actual values of qe(γγγ) will be bounded from

above by the corresponding (1 + ε)ze as is desired.

8. Solving the Maximum Entropy Convex Program via the Ellipsoid Method

In this section we design an algorithm to find a λ-random spanning tree distribution that preserves

the marginal probability of all the edges within multiplicative error of 1+ε using the Ellipsoid method

as an alternative to the combinatorial approach provided in Section 7. The following is our main

theorem.

Theorem 8.1. There is an algorithm, that for a given z in the relative interior of the spanning

tree polytope of G= (V,E), and ε≤ 1/8, outputs a vector γγγ such that for λλλ= exp(γγγ), the λλλ-random

spanning tree distribution, µ, satisfies

∑T∈T :T3e

Prµ

[T ]≤ (1 + ε)ze, ∀e∈E. (8.1)


Furthermore, the running time of the algorithm is polynomial in n= |V |, − log zmin and log(1/ε).

Remark 8.1. Theorem 8.1 assumes that z is in the relative interior of the spanning tree polytope.

However, if z is on a facet of the spanning tree polytope, then one can always find a point z in the

relative interior of the spanning tree polytope such that z e ≤ (1 + ε/3)z e for every edge e. Note that

for ε≤ 1/5, we have (1 + ε/3)2 ≤ 1 + ε. Hence, applying Theorem 8.1 on z and ε/3 guarantees that

(8.1) holds for z and ε.

Subsequent to [AGM+10], Singh and Vishnoi [SV14] generalized and improved the above theorem;

they showed that for any family of discrete objects,M on a ground set E of elements, and any given

marginal probability vector in the interior of the convex hull of characteristic vectors corresponding

to objects of M, one can efficiently compute a product distribution which approximately preserves

the marginal probabilities if and only if there is an efficient algorithm that approximately counts

the weighted sum of all the objects of M for a given vector γγγ, i.e., an efficient algorithm that

approximates∑

M∈M exp(γ(M)) for a given γγγ. For example, since there is an efficient algorithm

that approximately counts the weighted sum of all of the perfect matchings of a bipartite graph for

a given γγγ, [JSV04], one can approximately compute the maximum entropy distribution defined over

the perfect matchings with respect to any given marginal vector in the relative interior of the perfect

matching polytope (see [SV14] for more details).

First, we show that to prove the preceding theorem it is enough to design an algorithm that

approximately solves the dual program (5.4).

Theorem 8.2. There is an algorithm, that for a given z in the relative interior of the spanning tree

polytope of G= (V,E), and ε≤ 1/8, outputs γγγ such that

1 +∑e∈E

zeγe−∑T∈T

eγ(T ) ≥OPTEnt− ε.

Furthermore, the running time of the algorithm is polynomial in n and log(1/ε).

We can use the preceding theorem to prove Theorem 8.1


Proof of Theorem 8.1: Using Theorem 8.2 for ε= ε, we can find γγγ such that for any γγγ,

∑e∈E

zeγe−∑T∈T

eγγγ(T ) ≥−ε+∑e∈E

zeγe−∑T∈T

eγγγ(T ). (8.2)

To prove the theorem, it is enough to choose ε such that∑T :e∈T eγγγ(T )∑T∈T eγγγ(T )

≤ ze(1 + ε), ∀e∈E. (8.3)

To prove the above inequality, first we lower bound the denominator (of the LHS), then we upper

bound the numerator. Let δ > 0 be a parameter that we fix later. Firstly, for any e ∈ E, let γe =

γe + δn−1

. Then, by (8.2),

(eδ − 1)∑T∈T

eγγγ(T ) =∑T∈T

(eγγγ(T )− eγγγ(T ))≥−ε+∑e∈E

ze(γe− γe) =−ε+∑e∈E

zeδ

n− 1= δ− ε. (8.4)

Secondly, fix an edge f ∈E; let γe = γe for all e 6= f and γf = γf − δ. Then, by (8.2),

∑T :f∈T

eγγγ(T )(1− e−δ) =∑T :f∈T

(eγγγ(T )− eγγγ(T ))≤ ε+∑e∈E

ze(γe− γe) = ε+ δ · zf . (8.5)

Therefore, to prove (8.3), it is enough to choose δ, ε such that

δ− εeδ − 1

≥ 1− ε/4,

δzf + ε

1− e−δ≤ zf (1 + ε/2).

Letting δ = ε/8 and ε= zminδ2 and using the fact that ex ≤ 1 + x+ x2/2 for any −1≤ x≤ 1 proves

the above inequalities. It follows that∑T :f∈T eγγγ(T )∑T∈T eγγγ(T )

≤δzf+ε

1−e−δ

δ−εeδ−1

≤ zf (1 + ε/2)

1− ε/4≤ zf (1 + ε).

The first inequality follows by (8.4) and (8.5).

In the rest of this section we prove Theorem 8.2. Consider a concave function g(·) that is differ-

entiable everywhere. We use ∇g to denote the gradient of g. One can use the Ellipsoid method to

(approximately) find the maximum of g(·) given a first-order oracle for g(·), i.e., a routine that for a

given vector γγγ returns g(γγγ) and ∇g(γγγ). The statement is easily derivable from [BN12, thm 8.1.2].


Theorem 8.3. There is an algorithm that for any given ε > 0 and R > 0 returns a vector γγγ such

that

g(γγγ)≥ supγγγ:‖γγγ‖∞≤R

g(γγγ)− ε( supγγγ:‖γγγ‖∞≤R

g(γγγ)− infγγγ:‖γγγ‖∞≤R

g(γγγ)),

where ‖γγγ‖∞ is the largest coordinate, in absolute value, of the vector γγγ. Furthermore, the number

of calls to the first order oracle is upper bounded by a polynomial in the dimension of γγγ, logR and

log 1/ε.

We use the above theorem to prove Theorem 8.2. Let g(γγγ) = 1 +∑

e γeze −∑

T∈T eγγγ(T ). It is easy

to see that g(·) is concave, and differentiable everywhere. In addition, we can use the matrix tree

theorem (see [Bol02]) to obtain a first order oracle for g(·). To compute g(γγγ) it is enough to compute

a cofactor of the Laplacian of G as defined in Section 5.3 where λe = exp(γe). If ‖γγγ‖∞ <R, then all

entries of the weighted Laplacian matrix are at most exp(R); therefore, we can compute a cofactor

efficiently, in time polynomial in n and R. To compute an entry e of ∇g(γγγ), i.e., ∂g(γγγ)

∂γe, it is enough

to compute the function g(·) on the graph where e is contracted.

Lemma 8.1. For any ε < 1, there exists R=O(n3 log(n) +n2 log(1/ε)) such that

supγγγ:‖γγγ‖∞≤R

g(γγγ)≥OPTEnt− ε.

Before proving the above lemma, let us use its result to prove Theorem 8.2.

Proof of Theorem 8.2: The proof simply follows by an application of Theorem 8.3. As we

mentioned above, the function g(·) is concave, everywhere differentiable and has a first order oracle.

Choose R as in Lemma 8.1 such that


g(γγγ)≥OPTEnt− ε/2. (8.6)

Using Theorem 8.3 for R and ε= ε we get a point γγγ where

g(γγγ) ≥ supγγγ:‖γγγ‖∞≤R

g(γγγ) + ε( supγγγ:‖γγγ‖∞≤R


g(γγγ))

≥ OPTEnt−ε

2− ε( sup

γγγ:‖γγγ‖∞≤Rg(γγγ)− inf

γγγ:‖γγγ‖∞≤Rg(γγγ)). (8.7)


The second inequality follows by (8.6). For any γγγ where ‖γγγ‖∞ ≤R, we have |g(γγγ)| ≤ 1+nR+ |T |enR.

Since any graph with n vertices has at most nn−2 spanning tree ([Cay89]), |T | ≤ nn−2, so



g(γγγ)≤ 2(1 +nR+nn+nR),

Letting ε= ε/2

2(1+nR+nn+nR), by (8.7) we get g(γγγ)≥OPTEnt− ε. The theorem follows by the fact that R

is a polynomials in n and log(1/ε), and that the number of oracle calls of the algorithm Theorem 8.3

is a polynomial in log(1/ε) and log(R). Furthermore, each oracle call takes time which is polynomial

in n and R.

The last, and perhaps the most technical part of the proof is the proof of Lemma 8.1. We refer the

interested readers to [SV14] for a generalization of Lemma 8.1 which is one of the major contributions

of [SV14].

Proof of Lemma 8.1: Let us start by defining the concept of “gap”. We say that a vector γγγ

has a “gap at an edge f ∈E” if for any e∈E, λe /∈ (λf , λf + gap], where gap is a parameter that we

fix later. Observe that for any γγγ, the number of gaps of γγγ is at most |E|, corresponding to one gap

for each edge, which is at most(n2

).

In the following claim we show that if γγγ has a gap at an edge, then we can construct another vector

γγγ with fewer number of gaps such that g(γγγ)≈ g(γγγ). The proof is very similar in nature to the proof

of Lemma 7.3, but for the sake of completeness, we provide it here.

Claim 8.1. For any γγγ with at least one gap, there exists γγγ with at least one fewer gap such that∑e zeγe =

∑e zeγe and ∑

T∈T

eγγγ(T ) ≤ (1 + |T |e−gap)∑T∈T

eγγγ(T ). (8.8)

Proof of Claim 8.1: Suppose that γγγ has a gap at an edge f ∈E. Let F := e ∈E : γe > γf.

Let k= rank(F ) be the size of the maximum spanning forest of F . For any edge e∈E, let

γe =

γe + k∆

n−1if e /∈ F ,

γe−∆ + k∆n−1

if e∈ F ,


where ∆ = mine∈F γe − γf − gap. Note that by the assumption of the claim, we have ∆ > 0. By

definition of γγγ, it has fewer gaps than γγγ.

Recall that any spanning tree of G has at most k edges from F , so z (F )≤ k. Using this we get

∑e∈E

zeγe =∑e∈E

zeγe +k∆

n− 1

∑e∈E

ze− z (F )∆≥∑e

zeγe + k∆− k∆ =∑e

zeγe.

It remains to prove (8.8). First, observe that if a spanning tree T has exactly k edges from F , then

γγγ(T ) =γγγ(T ), and exp(γγγ(T )) = exp(γγγ(T )). We will show that the contribution of the rest of the trees

to the LHS of (8.8) is negligible.

Let Tmax be the maximum weighted spanning tree of (V,E,γγγ). By Lemma 7.2, Tmax has exactly

k edges of F . Since γγγ(T ) = γγγ(T ) for any tree where |T ∩F |= k, Tmax is also a maximum weighted

spanning tree of (V,E,γγγ). Consider a spanning tree T where |T ∩F |< k. It follows by the matroid

exchange property (see e.g., [CVZ10]) that there are edges e1 ∈ (Tmax ∩F ) \ T and e2 ∈ T \F such

that T ′ = T ∪e1 \ e2 is a spanning tree. By the definition of γγγ,

γγγ(Tmax) = γγγ(Tmax)≥ γγγ(T ′) = γγγ(T )− γe + γf >γγγ(T ) + gap, (8.9)

Therefore,

∑T∈T

eγγγ(T ) =∑

T :|F∩T |=k

eγγγ(T ) +∑

T :|F∩T |<k

eγγγ(T )

≤∑

T :|F∩T |=k

eγγγ(T ) +∑

T :|F∩T |<k

eγγγ(Tmax)−gap ≤ (1 + |T |e−gap)∑T∈T

eγγγ(T ),

which implies (8.8). This completes the proof of Claim 8.1.

Let γγγ∗ be an optimum solution of (5.4). If γγγ∗ does not have any gap we let γγγ = γγγ∗ and we are

done. Otherwise, we repeatedly apply Claim 8.1 and remove all of the gaps and we get a vector γγγ

such that∑

e zeγe =∑

e zeγ∗e , and

∑T∈T

eγγγ(T ) ≤ (1 + |T |e−gap)|E|∑T∈T

eγγγ∗(T ) = (1 + |T |e−gap)|E| ≤ 1 +nne−gap ≤ 1 + ε, (8.10)

where the first inequality follows by the fact that γγγ∗ has at most |E| gaps, the equality follows by the

fact that γγγ∗ is the optimal solution of (5.4), the second inequality follows by the fact that any graph


on n vertices has at most nn−2 spanning trees [Cay89], and the last inequality follows by letting

gap= log(1/ε) +n logn. Therefore,

g(γγγ)≥ g(γγγ∗)− ε=OPTEnt− ε.

So, to prove the theorem it is enough to show that ‖γγγ∞‖ ≤R. Since γ does not have any gap

maxeγe−min

eγe ≤ |E| · gap.

So, it is sufficient to lower-bound maxe γe and upper-bound mine γe. Firstly, since OPTEnt ≥

log(1/|T |)≥− log(nn−2)≥−n logn, we have

−n logn≤OPTEnt =∑e∈E

zeγ∗e =

∑e∈E

zeγe ≤ n ·maxe∈E

γe.

On the other hand, by (8.10), eγ(T ) ≤ 2 for any tree T , so mine γe ≤ 1. Therefore, ‖γγγ‖∞ ≤ n logn+

|E| · gap, as desired. This completes the proof of Lemma 8.1.

9. Concluding Remarks

In this paper, we provided the first asymptotic improvement over the long-standing Θ(logn)-

approximation ratio by Frieze, Galbiati and Maffioli [FGM82] for the asymmetric traveling salesman

problem (ATSP). Our main ideas are the concepts of “thin spanning trees” and “maximum entropy

rounding”.

Using ideas from network flow circulation, we showed that finding thin spanning trees lead to good

approximation algorithms for ATSP. We believe finding such thin trees could be the key to improve

the approximation factor for ATSP even further. We conjecture that constant thin trees exists and

consequently, constant factor approximation algorithms for the ATSP can be found.

We also developed a new randomized rounding technique called maximum entropy rounding to

produce thin trees. We used this method to round a fractional spanning tree (derived from the

Held-Karp relaxation of the ATSP) to an integral one such that the linear quantities defined on

the tree remain approximately the same. The main intuition behind our approach is that among all

marginal-preserving and structure-preserving ways of performing randomized rounding, maximum


entropy rounding tries to lower the dependencies among the variables as much as possible. This

helped us to derive strong concentration bounds on values of cuts in the rounded solution.

As we discussed before, we believe that this method could be useful in other settings and for

other combinatorial structures as well, and that we expect to see further applications in the future.

In particular, after the publication of [AGM+10], the maximum entropy rounding approach was

used to obtain an improved approximation algorithm for the graphic variant of the symmetric TSP,

where [OSS11] provided a ( 32− ε0)-approximation algorithm for some positive constant ε0.

Also, [SV14] recently showed that maximum entropy distributions over a discrete set of objects

subject to observed marginals can be computed, as long as there exists an efficient algorithm for

counting the underlying discrete set. Prime examples of such sets are spanning trees (as discussed in

our paper) and perfect matchings.

We note that the main property of the maximum entropy rounding over spanning trees that we

used in our algorithm was the fact that it imposed negative correlation over the edges. In [CVZ10],

another way of producing negatively correlated marginal-preserving probability distributions over

spanning trees (or, more generally, on matroid bases) has been proposed. This approach which is

based on the idea of swap rounding can also be used in the framework developed in our paper.

Finally, there has been recent developments on improving the integrality gap of the Held-

Karp relaxation for the asymmetric traveling salesman problem. In particular, [Sve15] provides a

polynomial-time constant factor approximation algorithm for the problem when restricted to shortest

path metrics of node-weighted digraphs. Also, [AO15] shows that the integrality gap of the Held-Karp

relaxation is in general bounded by (log log(n))O(1). However, the algorithm of [AO15] is based on

solving an exponential-size convex program that is not known to be solvable in polynomial time.

Therefore, improving the O(logn/ log logn)-approximation factor of our paper still remains an open

question.

Acknowledgments

The authors would like to thank the associate editor of the journal and the three anonymous referees for their

highly valuable comments. The work of the second and the third author was supported by NSF contract

CCF-0829878 and by ONR grant N00014-05-1-0148.


References[ABCC07] David L. Applegate, Robert E. Bixby, Vasek Chvatal, and William J. Cook. The Traveling Salesman

Problem: A Computational Study. Princeton University Press, 2007. 1[AGM+10] Arash Asadpour, Michel X. Goemans, Aleksander Madry, Shayan Oveis Gharan, and Amin Saberi. An

o(log n/ log log n)-approximation algorithm for the asymmetric traveling salesman problem. In ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 379–389, 2010. 2, 4, 8, 23, 25, 33, 39

[Ald90] David J. Aldous. A random walk construction of uniform spanning trees and uniform labelled trees. SIAMJournal on Discrete Mathematics, 3(4):450–465, 1990. 20

[AO15] Nima Anari and Shayan Oveis Gharan. Effective-resistance-reducing flows and asymmetric TSP. In IEEEAnnual Symposium on Foundations of Computer Science, FOCS, pages 20–39, 2015. 25, 39

[AS10] Arash Asadpour and Amin Saberi. An approximation algorithm for max-min fair allocation of indivisiblegoods. SIAM Journal of Computing, 39(7):2970–2989, 2010. 5

[Bla02] Markus Blaser. A new approximation algorithm for the asymmetric TSP with triangle inequality. InACM-SIAM Symposium on Discrete Algorithms, SODA, pages 638–645, 2002. 2

[BN12] A. Ben-Tal and A. Nemirovski. Optimization i-ii convex analaysis nonlinear programming theory nonlinearprogramming algorithms. Lecture Notes, 2012. 34

[Bol02] Bela Bollobas. Modern Graph Theory. Springer-Verlag, 2002. 21, 32, 35[Bro89] Andrei Z. Broder. Generating random spanning trees. In IEEE Symposium on Foundations of Computer

Science, FOCS, pages 442–447, 1989. 20[BTN01] Aharon Ben-Tal and Arkadi Semenovich Nemirovski. Lectures on modern convex optimization: analysis,

algorithms, and engineering applications. Society for Industrial and Applied Mathematics, Philadelphia,PA, USA, 2001. 17

[BV06] Stephen P. Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2006. 8,18

[Cay89] Arthur Cayley. A theorem on trees. Quarterly Journal of Mathematics, 23:376–378, 1889. 16, 36, 38[CGK06] Moses Charikar, Michel X. Goemans, and Howard J. Karloff. On the integrality ratio for the asymmetric

traveling salesman problem. Mathematics of Operations Research, 31:245–252, 2006. 25[Chr76] Nicos Christofides. Worst case analysis of a new heuristic for the traveling salesman problem. Report 388,

Graduate School of Industrial Administration, Carnegie-Mellon University, Pittsburgh, PA, 1976. 2[CMN96] Charles J. Colbourn, Wendy J. Myrvold, and Eugene Neufeld. Two algorithms for unranking arbores-

cences. Journal of Algorithms, 20(2):268–281, 1996. 9, 20[Coo12] William J. Cook. In Pursuit of the Traveling Salesman: Mathematics at the Limits of Computation.

Princeton University Press, 2012. 1[CT06] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory (Wiley Series in Telecommuni-

cations and Signal Processing). Wiley-Interscience, 2006. 4[CVZ10] Chandra Chekuri, Jan Vondrak, and Rico Zenklusen. Dependent randomized rounding via exchange

properties of combinatorial structures. In IEEE Symposium on Foundations of Computer Science, FOCS,pages 575–584, 2010. 23, 37, 39

[Edm71] Jack Edmonds. Matroids and the greedy algorithm. Mathematical Programming, pages 127–136, 1971. 42[ES13] Jeff Erickson and Anastasios Sidiropoulos. A near-optimal approximation algorithm for asymmetric TSP

on embedded graphs. CoRR, abs/1304.1810, 2013. 26[FGM82] Alan M. Frieze, Giulia Galbiati, and Francesco Maffioli. On the worst-case performance of some algorithms

for the asymmetric traveling salesman problem. Networks, 12:23–39, 1982. 1, 2, 38[FS07] Uriel Feige and Mohit Singh. Improved approximation ratios for traveling salesman tours and paths in

directed graphs. In APPROX, pages 104–118, 2007. 2[GBS08] Arpita Ghosh, Stephen Boyd, and Amin Saberi. Minimizing effective resistance of a graph. SIAM Review,

50(1):37–66, 2008. 21[GHJS09] Michel X. Goemans, Nicholas J. A. Harvey, Kamal Jain, and Mohit Singh. A randomized rounding

algorithm for the asymmetric traveling salesman problem. CoRR, abs/0909.0941, 2009. 3[GLS93] Martin Grotschel, Laszlo Lovasz, and Alexander Schrijver. Geometric Algorithms and Combinatorial

Optimization, volume 2 of Algorithms and Combinatorics. Springer, 1993. 9[God] Luis A. Goddyn. Some open problems I like. available at http://www.math.sfu.ca/~goddyn/Problems/

problems.html. 3, 26[Goe06] Michel X. Goemans. Minimum bounded degree spanning trees. In IEEE Symposium on Foundations of

Computer Science, FOCS, pages 273–282, 2006. 12[Gue83] Alain Guenoche. Random spanning tree. Journal of Algorithms, 4:214–220, 1983. 9, 20[Had93] Jacques Hadamard. Resolution d’une question relative aux determinants. Bulletin des Sciences

Mathematiques, 17:30–31, 1893. 12

http://www.math.sfu.ca/~goddyn/Problems/problems.html

http://www.math.sfu.ca/~goddyn/Problems/problems.html


[HK70] Michael Held and Richard M. Karp. The traveling salesman problem and minimum spanning trees.Operations Research, 18:1138–1162, 1970. 11

[HO14] Nicholas J. A. Harvey and Neil Olver. Pipage rounding, pessimistic estimators and matrix concentration.In ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 926–945, 2014. 26

[Jae84] Francois Jaeger. On circular flows in graphs, finite and infinite sets. Colloquia Mathematica SocietatisJanos Bolyai 37, pages 391–402, 1984. 26

[JSV04] Mark Jerrum, Alistair Sinclair, and Eric Vigoda. A polynomial-time approximation algorithm for thepermanent of a matrix with nonnegative entries. Journal of ACM, 51(4):671–697, 2004. 33

[Kar93] David R. Karger. Global min-cuts in RNC, and other ramifications of a simple min-cut algorithm. InACM-SIAM Symposium on Discrete Algorithms, SODA, pages 21–30, Philadelphia, PA, USA, 1993.Society for Industrial and Applied Mathematics. 4, 8, 23, 24

[Kir47] Gustav Kirchhoff. Uber die auflosung der gleichungen, auf welche man bei der untersuchung der linearenverteilung galvanischer strome gefuhrt wird. Annalen der Physik und Chemie, 72:497–508, 1847. 21

[KLSS05] Haim Kaplan, Moshe Lewenstein, Nira Shafrir, and Maxim Sviridenko. Approximation algorithms forasymmetric TSP by decomposing directed regular multigraphs. Journal of ACM, pages 602–626, 2005. 2

[KM09] Jonathan Kelner and Aleksander Mądry. Faster generation of random spanning trees. In IEEE Symposiumon Foundations of Computer Science, FOCS, pages 13–21, 2009. 20

[Kul90] Vidyadhar G. Kulkarni. Generating random combinatorial objects. Journal of Algorithms, 11:185–207,1990. 9, 20

[LK74] Jan K. Lenstra and Alexander H. G. Rinnooy Kan. Some simple applications of the Traveling SalesmanProblem. Operations Research Quarterly, 26:717–733, 1974. 1

[LP14] Russell Lyons and Yuval Peres. Probability on Trees and Networks. Cambridge University Press, 2014. Inpreparation. Current version available at http://mypage.iu.edu/~rdlyons/. 9, 18, 19, 21, 22

[LSKL85] Eugene L. Lawler, David B. Shmoys, Alexander H. G. Kan, and Jan K. Lenstra. The Traveling SalesmanProblem. John Wiley & Sons, 1985. 1

[OS11] Shayan Oveis Gharan and Amin Saberi. The asymmetric traveling salesman problem on graphs withbounded genus. In ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 967–975, 2011. 25, 26

[OSS11] Shayan Oveis Gharan, Amin Saberi, and Mohit Singh. A randomized rounding approach to the travelingsalesman problem. In IEEE Symposium on Foundations of Computer Science, FOCS, pages 267–276,2011. 4, 39

[PS97] Alessandro Panconesi and Aravind Srinivasan. Randomized distributed edge coloring via an extension ofthe Chernoff-Hoeffding bounds. SIAM Journal on Computing, 26:350–368, 1997. 4, 8, 22, 23

[PTW01] Pavel A. Pevzner, Haixu Tang, , and Michael S. Waterman. An eulerian path approach to dna fragmentassembly. In National Academy of Science 98(17), pages 9748–9753, August 2001. 1

[RT87] Prabhakar Raghavan and C. D. Thompson. Randomized rounding: A technique for provably good algo-rithms and algorithmic proofs. Combinatorica, 7(4):365–374, 1987. 5

[Sch03] Alexander Schrijver. Combinatorial Optimization, volume 2 of Algorithms and Combinatorics. Springer,2003. 13, 14, 42

[SV14] Mohit Singh and Nisheeth K. Vishnoi. Entropy, optimization and counting. In Symposium on Theory ofComputing, STOC, pages 50–59, 2014. 33, 36, 39

[Sve15] Ola Svensson. Approximating ATSP by relaxing connectivity. In IEEE Annual Symposium on Foundationsof Computer Science, FOCS, pages 1–19, 2015. 39

[Tho12] Carsten Thomassen. The weak 3-flow conjecture and the weak circular flow conjecture. Journal of Com-binatorial Theory, Series B, 102(2):521–529, 2012. 26

[VY99] Santosh Vempala and Mihalis Yannakakis. A convex relaxation for the asymmetric TSP. In ACM-SIAMSymposium on Discrete Algorithms, SODA, pages 975–976, 1999. 12

[Wil96] David Bruce Wilson. Generating random spanning trees more quickly than the cover time. In ACMSymposium on Theory of Computing, STOC, pages 296–303. ACM, 1996. 20

Appendix. Omitted Proofs


Proof of Lemma 3.1

From Edmonds’ characterization of the base polytope of a matroid [Edm71], it follows that the

spanning tree polytope P is defined by the following inequalities (see [Sch03, Corollary 50.7c]):

P = z ∈RE : z (E) = |V | − 1, (.1)

z (E(U))≤ |U | − 1 ∀U ⊂ V and U 6= ∅, (.2)

ze ≥ 0 ∀e∈E. (.3)

The relative interior of P corresponds to those z ∈ P satisfying all inequalities (.2) and (.3) strictly.

Clearly, z ∗ satisfies (.1) since:

∀v ∈ V, x ∗(δ+(v)) = 1 ⇒ x ∗(A) = n= |V |

⇒ z ∗(E) = n− 1 = |V | − 1.

Consider any non-empty set U ⊂ V . We have∑v∈U

x ∗(δ+(v)) = |U |= x ∗(A(U)) +x ∗(δ+(U))

≥ x∗(A(U)) + 1.

Since x ∗ satisfies (3.2) and (3.3), we have

z ∗(E(U)) =n− 1

nx ∗(A(U))< x ∗(A(U))≤ |U | − 1,

showing that z ∗ satisfies (.2) strictly. Since E is the support of z ∗, (.3) is also satisfied strictly by

z ∗. This shows that z ∗ is in the relative interior of P .

Proof of Lemma 6.1

Note that by definition, for all edges e ∈E, ze ≤ (1 + ε)ze, where ε= 0.2 is the desired accuracy of

approximation of z by z as in Theorem 5.2. Hence,

E[|T ∩ δ(U)|] = z (δ(U))≤ (1 + ε)z (δ(U)).

Applying Theorem 5.4 with

1 + δ= βz (δ(U))

z (δ(U))≥ β

1 + ε,

we derive that Pr[|T ∩ δ(U)|>βz (δ(U))] can be bounded from above by

Pr[|T ∩ δ(U)|> (1 + δ)E[|T ∩ δ(U)|]]

≤(

eδ

(1 + δ)1+δ

)z (δ(U))


≤(

e

1 + δ

)(1+δ)z (δ(U))

=

(e

1 + δ

)βz (δ(U))

≤

[(e(1 + ε)

β

)β]z (δ(U))

≤ n−2.5z (δ(U)).

Note that in the last inequality, we have used that

log

[(e(1 + ε)

β

)β]= 4

logn

log logn[1 + log(1 + ε)− log(4)

− log logn+ log log logn]

≤ −4 logn

(1− log log logn

log logn

)≤ −4

(1− 1

e

)logn≤−2.5 logn,

since e(1 + ε)< 4 and log log lognlog logn

≤ 1e

for all n≥ 5 (even for n≥ 3).

Proof of Lemma 7.1

Let us consider some f ∈E \ e. We have

qf (γγγ′) =

∑T∈T :f∈T eγγγ

′(T )∑T∈T eγγγ′(T )

=

∑T :e∈T,f∈T eγγγ

′(T ) +∑

T :e/∈T,f∈T eγγγ′(T )∑

T3e eγγγ′(T ) +∑

T :e/∈T eγγγ′(T )

=e−δ∑

T :e∈T,f∈T eγγγ(T ) +∑

T :e/∈T,f∈T eγγγ(T )

e−δ∑

T3e eγγγ(T ) +∑

T :e/∈T eγγγ(T )

=e−δa+ b

e−δc+ d

with a, b, c, d appropriately defined. On the other hand, qf (γγγ) = a+bc+d

by the definition (see Eq. (7.1)).

But, for general a, b, c, d≥ 0, if ac≤ a+b

c+d, then xa+b

xc+d≥ a+b

c+dfor x≤ 1. Since

a

c=

∑T∈T :e∈T,f∈T eγγγ(T )∑T∈T :e∈T eγγγ(T )

≤ qf (γγγ) =a+ b

c+ d

by negative correlation (since a/c represents the conditional probability that f is present given that

e is present), we get that qf (γγγ′)≥ qf (γγγ) for δ≥ 0.

Now, we proceed to deriving the equation for δ. By definition of qe(γγγ), we have

1

qe(γγγ)− 1 =

∑T :e/∈T eγγγ(T )∑T3e eγγγ(T )

.


Hence,

1

qe(γγγ′)− 1 =

∑T :e/∈T eγγγ

′(T )∑T3e eγγγ′(T )

=

∑T :e/∈T eγγγ(T )

e−δ∑

T3e eγγγ(T )

= eδ(

1

qe(γγγ)− 1

),

which concludes the proof.

an o(log n=loglog n -approximation algorithm for...

Documents