global optimization and approximation algorithms in ... · 1.2 convex optimization in this section...

GLOBAL OPTIMIZATION AND APPROXIMATION

ALGORITHMS IN COMPUTER VISION

CARL OLSSON

Faculty of EngineeringCentre for Mathematical Sciences

Mathematics

MathematicsCentre for Mathematical SciencesLund UniversityBox 118SE-221 00 LundSweden

http://www.maths.lth.se/

Licentiate Theses in Mathematical Sciences 2007:1ISSN 1404-028X

ISBN 978-91-628-7268-7LUTFMA-2026-2007 © Carl Olsson, 2007

Printed in Sweden by KFS, Lund 2007

Organization Document name

Centre for Mathematical SciencesLund Institute of Technology

LICENTIATE THESES IN MATHEMATICAL SCIENCES

Mathematics Date of issue

Box 118 September 2007SE-221 00 LUND Document Number

LUTFMA-2026-2007Author(s) SupervisorCarl Olsson Fredrik Kahl, Kalle Åström

Sponsering organisation

Title and subtitleGlobal Optimization and Approximation Algorithms in Computer Vision

AbstractComputer Vision is today a wide research area including topics like robot vision, image analysis, pattern recognition, medical imaging and geometric reconstruction problems. Over the past decades there has been a rapid development in understanding and modeling different computer vision applications. Even though much work has been made on modeling different problems, less work has been spent on deriving algorithms that solve these problems optimally. Generally one is referred to local search methods such as bundle adjustment.In this thesis we are interested in developing methods that are guaranteed to find globally optimal solutions.Typically the considered optimization problems are non-convex and may have many local optima.

The thesis consists of an introductory chapter followed by five papers. The introduction gives a motivation for the thesis, and a brief introduction to some concepts from optimization that are used throughout the thesis. Furthermore a summary of the included papers is given.

In the first paper we study different kinds of pose and registration problems involving Euclidean transformations. We develop an efficient branch and bound algorithm, that is guaranteed to find the global optimum. In the second paper the theory of L∞-optimization is applied to 1D-vision problems.We show that the problems considered can be simplified considerably when using the L∞-norm instead of the standard L2-norm. In the third paper necessary and sufficient conditions for a global optimum for a class of L∞-norm problems is derived. Based on these conditions a more efficient algorithm, compared to the usual bisection method, is presented. The fourth paper deals with large scale binary quadratic optimization. Two alternatives to semidefinite programming, based on spectral relaxations, are given.In the final paper we present a reformulation of the classical normalized cut method for image segmentation.We show that using this formulation it is possible to incorporate contextual information.

Key wordsKeywords: quasiconvex optimization, multiple view geometry, registration, camera pose, image segmentation, normalized cuts, branch and bound, spectral relaxations, computer vision, binary quadratic optimization.

Classification system and/or index terms (if any)

Supplementary bibliography information

ISSN and key title ISBN1404-028X 978-91-628-7268-7Language Number of pages Recipient’s notesEnglish 142Security classification

The thesis may be ordered from the Department of Mathematics, under the adress above.

Preface

This licentiate’s thesis considers optimization methods used in computer vision. Numer-ous problems in this field as well as in image analysis and other branches of engineeringcan be formulated as optimization problems. Often these problems are solved usinggreedy algorithms that find locally optimal solutions. In this thesis we are interested indeveloping methods for finding solutions that are globally optimal. For certain (NP-hard)problems it can be shown that it is not possible to find the global optimum in reasonabletime. In this case one wishes to find approximation algorithms that yield, not just locallyoptimal, but solutions that are close to the global optimum. The thesis consists of fivepapers and an introductory chapter. In the introduction some background material andan overview of the papers is given.

The thesis consist of the following five papers:

• C. Olsson, F. Kahl, M. Oskarsson, Branch and Bound Methods for Euclidean Reg-istration Problems, submitted to IEEE Transactions on Pattern Analysis and MachineIntelligence, 2007.

• K. Åströ m, O. Enqvist, C. Olsson, F. Kahl, R. Hartley, An L∞ Approach toStructure and Motion Problems in 1D-Vision, Proc. International Conference onComputer Vision (ICCV), Rio de Janeiro, Brazil, 2007.

• C. Olsson, A. P. Eriksson, F. Kahl, Efficient Optimization for L∞-problems usingPseudoconvexity, Proc. International Conference on Computer Vision (ICCV), Riode Janeiro, Brazil, 2007.

• C. Olsson, A. P. Eriksson, F. Kahl, Improved Spectral Relaxation Methods for Bi-nary Quadratic Optimization Problems, submitted to Computer Vision and ImageUnderstanding, 2007

• A.P. Eriksson, C. Olsson, F. Kahl, Normalized Cuts Revisited: A Reformulation forSegmentation with Linear Grouping Constraints, Proc. International Conference onComputer Vision (ICCV), Rio de Janeiro, Brazil, 2007.

The first three papers concern problems where it is possible to find the global opti-mum, while the last two papers deal with approximation techniques for known NP-hardproblems.

Most of the material is covered by these papers or described in the introduction. Thethesis is also based on the following papers:

• C. Olsson, F. Kahl, M. Oskarsson, Optimal Estimation of Perspective CameraPose, Proc. International Conference on Pattern Recognition (ICPR), Hong Kong,China, 2006.

3

• C. Olsson, F. Kahl, M. Oskarsson, The Registration Problem Revisited: OptimalSolutions From Points, Lines and Planes, Proc. Computer Vision and Pattern Recog-nition (CVPR), New York City, USA, 2006.

• A.P. Eriksson, C. Olsson, F. Kahl, Image Segmentation with Context, Proc. Scan-dinavian Conference on Image Analysis (SCIA), Åhlborg, Denmark, 2007.

• C. Olsson, A. P. Eriksson, F. Kahl, Solving Large Scale Binary Quadratic Prob-lems: Spectral Methods vs. Semidefinite Programming, Proc. Computer Vision andPattern Recognition (CVPR), Minneapolis, USA, 2007.

• A.P. Eriksson, C. Olsson, F. Kahl, Efficient Solutions to the Fractional Trust RegionProblem, Proc. Asian Conference on Computer Vision (ACCV), Tokyo, Japan, 2007.

• I. Dressler, C. Olsson, K. Åströ m, A. Robertsson, R. Johansson, Automatic Kine-matic Calibration of a Robot Using Vision, to be submitted.

Acknowledgements

First of all, I would like to thank my supervisors Fredrik Kahl and Kalle Åström forgiving me guidance, support and patience. Furthermore, their careful examination of thismanuscript as well as other manuscripts has improved the quality considerably. I wouldalso like to thank Fredrik for introducing me to optimization in computer vision and formany discussions on the subject. The time he has devoted to me has been crucial, hishelp and guidance has lead to a number of accepted papers; thank you Fredrik.

I have had the privilege of working within the Mathematical Imaging Group at theCentre for Mathematical Sciences. I am indebted to the members of the group as wellas other colleagues within the department for interesting discussions. Most notably toAnders P. Eriksson for fruitful collaborations on different problems. Furthermore, Iwould like to acknowledge Olof Barr for not so interesting, but necessary, discussionson the spelling and grammatics of the English language, Gunnar Sparr for reading themanuscript and Anki Ottosson for helping me with all kinds of administrative difficul-ties.

Finally, I would also like to thank my family for general support in all aspects of reallife.

4

Introduction

1.1 Motivation

Computer Vision has evolved into a wide research area including topics like robot vision,image analysis, pattern recognition and multiple view geometry. Over the past decadesthere has been a rapid development in understanding and modeling different computervision applications. While much work has been made on modeling different problems,less work has been spent on deriving algorithms that solve these problems optimally. Typ-ically one is referred to local search methods such as gradient descent search or Newtonbased methods.

Consider for instance the two view structure and motion problem. The goal is tocompute the 3D scene geometry (structure) and the positions and relative orientationsof the cameras (motion) from corresponding image features. A common approach forsolving this problem is by calculating the so called essential matrix. This is done by usingan algebraic solver such as the six-point solver (see [11]). Even though this works fine ifthe data is well behaved, it often fails in the presence of significant measurement noise.This is because the above method is only able to solve the problem for exactly six pointcorrespondences, and hence any remaining data is disregarded.

Instead we would like to optimize a geometrically meaningful quantity such as thereprojection errors. It is easy to formulate a meaningful optimization criterion for goodreconstructions, however, we are referred to local search methods for solving it. Thesuccess of these methods are highly dependent on good initialization, which is typicallydone with algebraic solvers.

In contrast to the methods mentioned above, we would like to find a meaningfuloptimization criterion that allows us to design algorithms that can be shown to convergeto the global optimum. The goal of this thesis is to develop methods that are guaranteedto find globally optimal solutions if possible. For certain problems this is not possibleto do in reasonable time. In these cases we want to find approximate solutions that arenot just locally optimal, but with objective value close to the global optimum. In mostcases, good formulations of the problems are readily available. The derivation of modelsfall outside the scope of this thesis; we are merely interested in how to solve the existingformulations optimally.

Our aim has been to improve the state-of-the-art for a large class of optimizationproblems in computer vision. Applications include multiview geometry problems, reg-istration problems, image segmentation, partitioning, binary restoration and subgraph

5

INTRODUCTION

matching. Our approach is based on global optimization methods, such as Branch andBound, convex and quasiconvex optimization. Some of the problems can be solved ex-actly with global methods while others have to be approximated to obtain good solutionsin reasonable time.

The remainder of this chapter is organized as follows: In section 1.2 and 1.3 we givea short introduction to some basic concepts form optimization that are used throughoutthe thesis. In section 1.4 we give an overview of the included papers and summarize themain contributions.

1.2 Convex Optimization

In this section we will review some basic concepts used in optimization, that are usedthroughout this thesis. For a more complete introduction of convex optimization see[4, 2].

1.2.1 Convex Sets

A set S ∈ Rn is called convex if the line segment joining any two points in S is contained

in S. That is, if x, y ∈ S then λx + (1 − λ)y ∈ S for all λ with 0 ≤ λ ≤ 1.We call a point x of the form

x =

n∑

i=1

λixi, (1.1)

where∑n

i=1λi = 1, 0 ≤ λi ≤ 1 and xi ∈ S, a convex combination of the points

x1, ..., xn. A convex set always contains every convex combination of its points. Further-more, it can be shown that a set is convex only if it contains all its convex combinations.

The convex hull of a set S, denoted convhull(S), is the set of all convex combinationsof points in S. Since this set contains all its convex combinations it is a convex set. It isalso the smallest convex set containing S. Figure 1.1 shows some simple examples of thenotions introduced.

Next we will state three special cases of convex sets that are used extensively through-out the thesis.

The halfspace. A halfspace is a set of the form

{x ∈ Rn; aT x ≤ b}, (1.2)

where a 6= 0, i.e., it is the solution set of a nontrivial affine inequality. Theboundary of the halfspace is the hyperplane {x ∈ R

n; aT x = b}. It is straightforward to verify that these sets are convex.

6

1.2. CONVEX OPTIMIZATION

Figure 1.1: Left: A convex set. Middle: A non-convex set. Right: The convex hull of themiddle set.

The second order cone. Let || · || be any norm on Rn. The norm cone in R

n+1 associ-ated with the norm || · || is the set

{(x, t) ∈ Rn+1; ||x|| ≤ t}. (1.3)

From the general properties of norms it follows that the norm cone is a convex setin R

n+1. The second order cone is the norm cone for the Euclidean norm

{(x, t) ∈ Rn+1; ||x||2 ≤ t}. (1.4)

The positive semidefinite cone. Let Sn be the set of symmetric matrices. The set S

n

can be viewed as a vectorspace of dimension n(n + 1)/2. By X � 0 we mean thatthe matrix X is positive semidefinite. The set of symmetric positive semidefinitematrices

Sn+ = {X ∈ R

n×n; X = XT , X � 0}, (1.5)

is a convex set in Sn. This can be seen by noting that if yT Ay ≥ 0, and yT By ≥

0, thenλyT Ay + (1 − λ)yT By ≥ 0. (1.6)

If f : Rm 7→ R

n is an affine mapping then the set S′ = {x; f(x) ∈ S} is convex inR

m if S is convex in Rn. That is, convexity is preserved under affine mappings. When

applied to the second order cone we get the sets of type

{x; ||Ax + b||2 ≤ cT x + d}. (1.7)

We can define an affine mapping A(x) : Rn 7→ Sm by

A(x) = x1A1 + ... + xnAn + B (1.8)

where A1, ..., An, B ∈ Sm. When applied to the positive semidefinite cone we get the

linear matrix inequalities

x1A1 + ... + xnAn + B � 0. (1.9)

7

INTRODUCTION

Convexity is also preserved under intersection. Thus a set S that is given by severalof the constraints above (halfspaces, cone-constraints and linear matrix inequalities) is aconvex set.

1.2.2 Convex functions

A function f : S 7→ R is called convex if S is a convex set, and for all x, y ∈ S and0 ≤ λ ≤ 1, we have

f(λx + (1 − λ)y) ≤ λf(x) + (1 − λ)f(y). (1.10)

The geometric interpretation of this definition, is that the line segment between the points(x, f(x)) and (y, f(y)) should lie above the graph of f . The function f is said tobe concave if −f is convex. For differentiable functions convexity can alternatively bedefined by the inequality

f(x) ≥ f(y) + ∇f(y)T (x − y). (1.11)

If f is differentiable then f is convex if and only if (1.11) holds for all x, y ∈ S. Geo-metrically this means that f is convex if and only if f lies above its tangent plane at allpoints. Figure 1.2 shows the geometrical interpretation of the definitions. It is easy to

x y y

Figure 1.2: Graph of a convex function. Left: The line segment joining the points(x, f(x)) and (y, f(y)) lies above the graph. Right: The graph of the function liesabove its tangent plane at y.

see why convexity is important in optimization in view of (1.11). The inequality statesthat the tangent plane is a global underestimator of the function. That is, from localinformation (the gradient) it is possible to obtain global information (an underestimator)for the function. For example, if ∇f(y) = 0 then f(x) ≥ f(y) for all y ∈ S. Thatis, any stationary point is also a global minimum. When dealing with maximization thesame properties apply if convexity is replaced with concavity.

8

1.2. CONVEX OPTIMIZATION

If the function f is twice differentiable, then another useful condition for checkingconvexity is via the Hessian ∇2f(x). The function f if convex on S if and only if

∇2f(x) � 0. (1.12)

for all x ∈ S.Next we state some functions used in this thesis that are convex.

Linear functions. A linear function cT x is both convex and concave, since (1.10) isfulfilled with equality.

Quadratic functions. A quadratic function 1

2xT Qx is convex if and only if Q � 0 since

Q is its Hessian.

Quadratic over linear function. The ratio between a quadratic and a linear functionf(x, y) = x2

yis convex on the domain y > 0, since for y > 0

∇2f(x, y) =2

y3

[

y2 −xy−xy x2

]

=2

y3

[

y−x

] [

y−x

]T

� 0. (1.13)

Pointwise supremum of convex functions. Let g(x) = supy f(x, y) where f is con-vex in x. Then

g(λx1 + (1 − λ)x2) = supy

f(λx1 + (1 − λ)x2, y) (1.14)

≤ supy

(

λf(x1, y) + (1 − λ)f(x2, y))

(1.15)

≤ supy

λf(x1, y) + supy

(1 − λ)f(x2, y) (1.16)

= λg(x1) + (1 − λ)g(x2). (1.17)

Similarly a pointwise infimum of concave functions is a concave function.

Maximum eigenvalue of a symmetric matrix. The matrix function f(X) = λmax(X)is convex on S

n, since f can be expressed as

f(X) = sup||y||2=1

yT Xy, (1.18)

which is a pointwise supremum of a family of functions linear in X . Similarly theminimum eigenvalue function is a concave function.

Composition with an affine function. If f is a convex (or concave) function then so isf(Ax + b).

9

INTRODUCTION

1.3 Constrained Optimization

A constrained optimization problem is a problem of the following form

minx∈S

f(x) (1.19)

s.t. gi(x) ≤ 0 i = 1, ..., m (1.20)

hj(x) = 0 j = 1, ..., p. (1.21)

Here the domain of x is specified by the set S and the implicit constraints gi(x) ≤ 0,i = 1, ..., m and hj(x) = 0, j = 1, ..., p. We call this problem the primal problem. Anx that fulfills all the constraints of the problem is called feasible. If the functions f andgi, i = 1, ..., m are convex, hj , j = 1, ..., p are affine and S convex then we call this aconvex optimization problem. Just as in the unconstrained case convexity is importantsince it implies that any local minimum is also a global minimum.

1.3.1 Lagrangian Duality

The idea behind Lagrangian duality is to take the constraints into account by augment-ing the objective function with a weighted sum of the constraints and then solve as anunconstrained problem. The Lagrangian of problem (1.19)-(1.21) is

L(x, λ, ν) = f(x) +m

∑

i=1

λigi(x) +

p∑

i=1

νihi(x). (1.22)

The variables λ and ν are called the dual variables or the Lagrange multipliers.The Lagrangian dual function g is defined as

g(λ, ν) = infx∈S

L(x, λ, ν). (1.23)

Since it is the infimum of a set of functions affine in λ and µ it is a concave function.Let p∗ be the optimal value of the problem (1.19)-(1.21). If λ ≥ 0, then for a feasible

point x we havem

∑

i=1

λigi(x) +

p∑

i=1

νihi(x) ≤ 0, (1.24)

since the first sum is nonpositive and the second is zero, therefore

L(x, λ, ν) ≤ f(x). (1.25)

Sinceg(λ, ν) = inf

x∈SL(x, λ, ν) ≤ L(x, λ, ν) ≤ f(x) (1.26)

10

1.3. CONSTRAINED OPTIMIZATION

holds for all feasible x, the dual function g(λ, ν) yields a lower bound on the optimalvalue p∗ for each (λ, ν). The dual problem is defined as

max g(λ, ν) (1.27)

s.t. λ ≥ 0. (1.28)

Since g(λ, ν) yields a lower bound for each (λ, ν) we can interpret this problem as findingthe best lower bound on p∗. Let d∗ be the optimal value of the dual problem. Thedifference p∗−d∗ is called the duality gap. In view of (1.26) it is easy to see that d∗ ≤ p∗

always holds, however, in general the duality gap is not zero. If the problem is convexthen the gap is usually zero, but not always.

1.3.2 The KKT conditions

Next we will introduce the Karush-Kuhn-Tucker (KKT) conditions. These are necessaryconditions for a minimum for optimization problems in general, and for convex problemsthey are usually also sufficient. Let x∗ be optimal for the primal problem and (λ∗, ν∗)optimal for the dual problem, and assume that there is no duality gap. Then

f(x∗) = g(λ∗, ν∗) = f(x∗) +

m∑

i=1

λ∗i gi(x

∗) +

p∑

i=1

ν∗i hi(x

∗) ≤ f(x∗). (1.29)

Since x∗ is feasible we know that each term in the sum∑m

i=1λ∗

i gi(x∗) is nonpositive,

and by (1.29) we must haveλ∗

i gi(x∗) = 0, (1.30)

for all i. These conditions are known as the complementary slackness conditions. Roughlyspeaking, this means that a Lagrange multiplier should be zero unless its correspondingconstraint is active.

If x∗ minimizes L(x, λ∗, ν∗), it follows that its gradient vanishes at x∗, thus we have

∇f(x∗) +

m∑

i=1

λ∗i∇gi(x

∗) +

p∑

i=1

ν∗i ∇hi(x

∗) = 0 (1.31)

λ∗i gi(x

∗) = 0 (1.32)

gi(x∗) ≤ 0 (1.33)

hi(x∗) = 0 (1.34)

λi ≥ 0. (1.35)

These are the KKT conditions. For any problem they are necessary for a global minimum,however, in general they are not sufficient. The KKT conditions play an important rolein optimization. Many algorithms can be viewed as methods for solving the KKT condi-tions.

11

INTRODUCTION

1.4 Summary of the Thesis

As the title suggests the theme of this thesis is optimization. It can roughly be dividedinto two parts; problems that can be solved exactly and problems which have to be ap-proximated in order to find solutions with almost as good objective value as the globaloptimum. In Papers I,II and III we deal with problems of the first kind, whereas PapersIV and V are concerned with problems of the latter kind.

The connection between these two parts, and also the original motivation for thiswork, is the problem of registration. In the general registration problem we have a modelof an object given in some coordinate system (the model system) and a set of featuresin another coordinate system (the measurement system). The object is to find the rela-tionship between these coordinate systems, that is, find the position and orientation ofthe object. Usually one wants to find the transformation that minimizes the sum of thesquared residual errors, the L2-norm, since this is the statistically optimal choice assumingindependent Gaussian noise in the measurement system.

In robotics the measurements are typically point coordinates and the model system isrelated to the measurement system by a similarity transformation. For a general surfacemodel of the object the most common approach is by employing the Iterative ClosestPoint (ICP) algorithm, see [3]. The ICP is a local iterative method, and its success ishighly dependent on the initial guess. Attempts have been made to make it more robustto local minima (e.g. [9, 7, 14, 6]), however, this is still a problem in particular if thegeometry is complex.

Another approach is to constrain the geometry of the models. In Paper I we considermodels that consists of points, lines and planes. We assume that the correspondencesbetween the measured points and the model features are known, that is, given a pointmeasurement we know which point, line or plane this measurement comes from. Wedevelop an algorithm, based on the branch and bound method, that finds the globallyL2-optimal similarity transformation. A closed form solution for the problem when themodel only consists of points was given in [12].

In Paper I we use the same methodology for a similar vision problem, the camera poseestimation problem. In a vision system one is typically interested in the relationship be-tween the camera and the model. In this case the relationship is represented by the cameramatrix. Determining the camera matrix from known points and image measurements isknow as the resectioning problem. In case of a calibrated camera, the camera matrix hasorthogonality constraints similar to a rotation matrix, and it is called the camera pose esti-mation problem. The predominant method for solving it is the DLT algorithm as a linearstarting solution for a steepest decent algorithm (see [11, 26]). In the case of uncalibratedcameras [15] gave an optimal solution for the L2-norm case.

The problem with using the L2-norm is that the obtained optimization problems arein general non-convex, thus requiring more computationally expensive methods. There-fore [10] proposed to use the L∞-norm instead. In this case the objective is to minimize

12

1.4. SUMMARY OF THE THESIS

the maximum of the reprojection errors. In [16, 17] it was shown that if the residuals canbe written

fi(x) =||(aT

i1x + bi1, aTi2x + bi2)||2

(aTi3x + bi3)2

(1.36)

then the problemmin

xmax

ifi(x) (1.37)

is a quasivonvex problem on the set {x; aTi3x + bi3 > 0, ∀i}. It was shown that a group

of problems in multiview geometry (including the resectioning problem) are examplesof quasiconvex programs and can be solved efficiently by using the so called bisectionalgorithm. The list of problems that are quasiconvex have since then been extended in[32, 33, 32, 18], We extend it further in Paper II where we consider a few problemsappearing in 1D-vision.

In Paper III we study how to solve this type of problems more efficiently than with thebisection algorithm. These functions are in fact pseudoconvex and we use this property toderive necessary and sufficient conditions for a global minimum. Let µ∗ be the minimumvalue of (1.37). Then x∗ is a global minimizer if and only if there exists λ∗ such that

∑

i

λ∗i∇fi(x

∗) = 0,∑

i

λ∗i = 1, λ∗

i ≥ 0 (1.38)

and λ∗i = 0 if f(x∗) < µ∗. That is, the point 0 should be contained in the convex hull

of the gradients ∇fi(x∗) for which the residuals fi(x

∗) = µ∗. Using this result it is easyto show that the KKT conditions are both necessary and sufficient for a global optimum.

Note that it also tells us something about the properties of the L∞-norm. Togetherwith Caratheodory’s theorem it implies that there is always a set S of no more than n+1of the residuals, such that

minx

maxi

fi(x) = minx

maxi∈S

fi(x) (1.39)

if x ∈ Rn.

It also has implications for the registration problem with unknown constraints, whenusing the L∞-norm. Consider the problem of aligning two point clouds in R

2 using asimilarity transform, without knowing the point correspondences. In this case residualsare convex functions of 4 parameters. Thus by estimating the optimal transformation foreach combination of 5 residuals we are guaranteed to find the globally optimal transfor-mation.

A different approach to this problem is to regard the model points and the measure-ment points as nodes in two graphs, the model graph and the measured graph. Theweights of the graphs are distances between the points. The registration problem cannow be viewed as finding the subgraph in the model graph that is most similar to themeasurement graph. The can be done by considering pairwise matchings. Let {xi} be

13

INTRODUCTION

the measurement points and {yi} be model points. We assign a cost to the matching xi

to yl and xj to yk that depends on the difference∣

∣

∣|xi −xl|− |yj −yk|

∣

∣

∣. Minimizing the

sum of these costs amounts to minimizing a quadratic function subject to linear and bi-nary constraints. In general these type of problems are NP-complete and therefore has tobe approximated to achieve close to optimal solutions. An exception where it is possibleto solve exactly is when the objective function is submodular without linear constraints[5, 21, 20].

In [37] a spectral relaxation was applied to the subgraph matching problem. These re-laxations have been used for other problems such as motion segmentation, figure-groundsegmentation, clustering and digital matting [13, 24, 31, 22]. The authors of [30]used semidefinite programming to approximate the original problem, and in [19] it wasshown how to apply semidefinite programming to general binary optimization problems.Semidefinite programming yields better approximations then the spectral relaxations butat the cost of slower execution.

In Paper IV we try to combine the merits of spectral relaxations and semidefiniteprogramming, to solve the subgraph matching problem more efficiently. We developan improved spectral relaxation method that in principle can be applied to all of theseproblems. In [25] a theoretical result that states that the improved method give just asgood results as semidefinite programming was given. The method iteratively improvesthe spectral relaxation to obtain better bounds. Typically we are required solving 10-20spectral relaxations resulting in faster execution times than semidefinite programs.

In Paper IV we also introduce the so called trust region problem. It can be viewed asa special case of our relaxation, where we only try to impose the binary constraints on oneof the variables. This is a well studied problem in optimization [28, 29, 27, 35, 23, 34],which can be solved exactly. This relaxation also allows us to impose linear equalityconstraints exactly.

In Paper V we continue this line of work and show that using similar methods itis possible to add linear equality constraints to the Normalized Cuts framework [31].Previous attempts to do so has required introducing an extra variable and thus weakeningthe relaxation. It was shown in [38] that if the linear conditions form a subspace it ispossible to enforce the constraints by projecting the problem onto the subspace. In [8]a normalization parameter was added to their relaxation. The value was chosen, throughexperimental testing, such that the constraints often are approximately fulfilled.

Paper I:Branch and Bound Methods for Euclidean Registration Problems

In this paper we consider some registration and pose problems. The problem is to find atransformation relating two coordinate systems, the measurement system and the modelsystem. In the measurement system we have points that should map to convex sets in themodel system. Typically each set consist of a point or a line or a plane but in principle they

14


could be any convex sets. We want to determine a transformation such that the measuredpoints come as close to there corresponding convex sets as possible. To measure distancewe use the L2-norm. The set of transformations considered are Euclidean/Similaritytransforms and Euclidean/Similarity followed by a perspective transformation. Let xi bethe measured point corresponding to the convex set Ci, and d(xi, Ci) be the distancebetween xi and Ci. The problem can be written

infT

m∑

i=1

d(T (xi), Ci), (1.40)

where T is the sought transformation. If the transformation T depends affinely on someparameters this is an easy convex problem. However this is not the case when dealingwith Euclidean/Similarity transformations (in 3D). When the sets Ci are single pointsa formula for the solution of this problem was given in [12]. In the case of an affinetransformation followed by a perspective transformation, a solution was given in [1]. Asolution for the same problem when using the L∞-norm was given in [16].

In our method we relax the original objective function to obtain convex underesti-mators. These underestimators are then used in a branch and bound scheme to find theoptimal solution.

Paper II:An L

∞Approach to Structure and Motion Problems in 1D-Vision

Many geometrical problems can be formulated as optimization problems. Consider forinstance the n-view triangulation problem with 2D-cameras. Statistically optimal esti-mates can be found by minimizing some error measure between the image data and thereprojected data. The usual choice of objective function is the L2-norm of the reprojec-tion errors, since this is the statistically optimal choice assuming independent Gaussiannoise of equal and isotropic variance. Since closed form solutions are rarely available forthese problems, they are often solved by iterative algorithms. The problem with this ap-proach is that these methods often depend on good initialization to avoid local minima.

To resolve this problem L∞ optimization was introduced in [10]. The idea is tominimize the maximal reprojection error instead of the L2-norm. In [16], [17] it wasshown that the optimization problems obtained for a number of multiview-geometryproblems using the L∞-norm are examples of quasiconvex problems. These problemshave the nice property that any local minimum is also global minimum.

In this paper we apply the theory to 1D-structure and motion problems. Understand-ing of one-dimensional cameras is important in several applications, such as, vision forplanar motion, auto-calibration and navigation for autonomous guided vehicles. In thispaper we show that L∞-problems play an important role for 1D-cameras.

First we consider two problems: the resection problem and the intersection problem.The intersection problem is to determine a 2D-point from known image measurements

15

INTRODUCTION

and camera positions by triangulation. The resectioning problem is to determine thecamera positions from known 2D-points and there corresponding image measurements.We show that both these problems are in a sense easy to solve since they can be formulatedas quasiconvex problems when using the L∞-norm.

We also consider the general structure and motion problem. In this case both thecamera positions and the 2D-points are unknown and should be determined from imagemeasurements. It turns out that even though this is not a quasiconvex problem, quasicon-vexity plays an important role here as well. This is because if the relative orientation ofthe cameras are regarded as fix then finding the positions of the cameras and the measured2D-points can be posed as a quasiconvex problem. Using this fact we present a branchand bound method that branches over the set of relative orientations to find the optimalsolution.

Paper III:Efficient Optimization for L

∞-problems Using Pseudoconvexity

Since its introduction to the vision community in [10], L∞-problems have become asubject of intense research. The reason for its success is that many geometric visionproblems may have multiple local minima under the L2-norm, but only a single localminimum under the L∞-norm.

In [16],[17] it was shown that with the L∞-norm these problems are quasiconvex. Abisection-algorithm based on second order cone programs (SOCP) for solving this typeof problems was also introduced.

Let fi(x) be quasiconvex functions. The algorithm works by checking if there is anx satisfying fi(x) ≤ µ for all i for a fixed µ. Since fi is quasiconvex this is, what isknown as, a convex feasibility problem. For many of the multiview geometry problemsthis constraint can be reformulated as a second order cone constraint, if µ is fixed. Abisection is then performed on the parameter µ. Thus to solve the original problem weare led to solve a sequence of SOCPs.

In this paper we show that it is not necessary to fix µ during the optimization. Bymaking use of the property of pseudoconvexity we prove that the KKT conditions are notonly necessary but sufficient for these problems. Hence it should be possible to constructalgorithms for solving these problems more efficiently.

To test the prefomance of such an algorithm we used LOQO, which is an optimiza-tion software for general non-convex problems. It works by searching for solutions to theKKT conditions. We report significant improvements by up to 100 times faster executiontimes than the bisection algorithm.

16


Paper IV:Improved Spectral Relaxation Methods for Binary Quadratic Opti-mization Problems

In this paper we study different ways to find approximate solutions of the following binaryquadratic problem:

z = inf yT Ay + bT y, y ∈ {−1, 1}n, (1.41)

where A is an n×n (possibly indefinite) matrix. This problem can be used to model lotsof problems in computer vision, e.g. image restoration, motion segmentation, partition-ing, figure-ground segmentation, clustering and subgraph matching. It is known to beNP-complete (if A is indefinite) and therefore one is forced to approximate the problemto be able to develop efficient algorithms if n is large.

The most common approach to solve this problem is to ignore some of the constraintsand solve the following relaxed problem:

zsp = inf||x||2=n+1

xT Lx, (1.42)

where

x =

(

yyn+1

)

, L =

(

A 1

2b

1

2bT 0

)

.

This amounts to finding the algebraically smallest eigenvalue of the matrix L. Thereforeit is referred to as the spectral relaxation of (1.41). This relaxation has been made popularby the success of Normalized Cuts introduced in [31].

The benefits of using this formulation is that eigenvalue problems of this type arewell studied and there exist solvers that are able to efficiently exploit sparsity, resultingin fast execution times. A significant weakness of this formulation is that the constraintsy ∈ {−1, 1}n and yn+1 = 1 are relaxed to ||x||2 = n + 1, which often results in poorapproximations.

More recently, semidefinite programming (SDP) relaxations have also been appliedto the same type of computer vision problems, e.g., [19, 36, 30]. It can be shown thatsuch relaxations produce better estimates than spectral methods. However, as the numberof variables grows, the execution times of the semidefinite programs increase rapidly. Inpractice, one is limited to a few hundred decision variables.

Instead we propose to improve the accuracy of the spectral relaxation by attemptingto solve the dual problem

zsg = supσ

f(σ), (1.43)

wheref(σ) = inf

||x||2=n+1

xT (L + diag(σ))x − eT σ, (1.44)

17

INTRODUCTION

and e is a vector of all ones. Note that for a fixed σ, the spectral relaxation f(σ), is anunderestimation of the original problem. Solving this problem can thus be interpreted asfinding the best lower bound on the original problem.

The advantage of using this relaxation is that f is concave and can be efficientlyevaluated by finding the smallest eigenvalue of (L + diag(σ)). The downside is thatf is not differentiable everywhere. Therefore we use subgradients to determine ascentdirections for f .

A particular weakness of the spectral relaxation is that the artificial variable yn+1

is usually far away from ±1 when using this relaxation. This will disrupt the balancebetween the linear and the quadratic terms. In a segmentation problem, the linear termis usually the data term and the quadratic term is a smoothing term. A small yn+1 resultsin oversmoothed solutions, whereas for a large yn+1 the smoothing term is neglected.

To remedy this problem we propose to use the following relaxation:

ztr = inf||y||2=n

yT Ay + bT y. (1.45)

This problem is called the trust region subproblem, and it has been studied extensivelyin the optimization literature [28, 29, 35, 34, 27]. A remarkable fact is that it is a non-convex problem with no duality gap (see [4]), and it can therefore be solved exactly.

Paper V:Normalized Cuts Revisited:A Reformulation for Segmentation with Linear Grouping Constraints

One of the most popular approaches for image segmentation is Normalized Cuts intro-duced in [31]. Consider an undirected graph G, with nodes V and edges E and wherethe non-negative weights of each such edge is represented by an affinity matrix W withonly non-negative entries and of full rank. A min-cut is the non-trivial subset A of V suchthat the sum of edges between nodes in A and V is minimized, that is the minimizer of

cut(A, V ) =∑

i∈Aj∈V

wij . (1.46)

This is perhaps the most commonly used method for splitting graphs and is a well knownproblem for which very efficient solvers exist [5]. It has however been observed that thiscriterion has a tendency to produced unbalanced cuts. Smaller partitions are preferred tolarger ones.

In an attempt to remedy this problem [31] introduced a normalization factor to avoidbias towards smaller cuts. The normalized cut is defined as:

Ncut =cut(A, B)

assoc(A, V )+

cut(A, B)

assoc(B, V ), (1.47)

18


where A ∪B = V , A ∩ B = ∅ and the normalizing factor is defined as assoc(A, V ) =∑

i∈A,j∈V wij . The normalizing factor assoc(A, V ) measures the sum of the weightsbetween nodes in A and nodes in the entire graph. Hence for a small set A, assoc(A, V )will also be small, and therefore larger cuts will be preferred.

Although this is no longer a simple problem to solve, [31] proposed to computecontinuous underestimators using Rayleigh quotients. Let wij be the edge weight forthe edge between nodes i and j, and let z ∈ {−1, 1}n be the class label vector. Theunderestimator for the minimum normalized cut is

Ncut = inf||z||2=n

zT Lz

zT Mz, (1.48)

where the two matrices L and M are derived from the edge weights of the graph. Adrawback of this formulation is that one can only include homogenous linear constraints.That is, constraints of the form Cz = 0. Let NC be an orthogonal base for the nullspaceof C. Every z can be written z = NCy for some y. The Rayleigh quotient abovethen transforms into another quotient in the nullspace of C. Let LC = NT

C LNC ,MC = NT

C MNC and ||y||2C = yT NTC NCy. Then one obtains

Ncut = inf||y||2

C=n

yT LCy

yT MCy. (1.49)

When the constraints are not homogenous, that is Cz = b, the above method failsbecause one does not obtain a Rayleigh quotient. In fact linear terms will appear both inthe nominator and the denominator. In this case one is forced to add an artificial variablezn+1. Let z be the z-vector extended with zn+1. The constraints can then be written

[C − b]z = 0. (1.50)

In a similar way as above one obtains a Rayleigh quotient

Ncut = inf||y||2

C=n

y2k+1=1

yT LC y

yT MC y. (1.51)

with the extra constraint y2k+1

= 1. Note that we may always choose the base NC suchthat yk+1 = zk+1. Here y is the extended vector as in the case with z. Note that wecan allow yk+1 = −1 here since in this case a solution with yk+1 = 1 can be obtainedby switching the sign of the entire y-vector. The common way to proceed is now to dropone of the two constraints, resulting in poor approximations.

However in this paper we show that the above relaxation can be solved exactly byconsidering the dual problem

supt

inf||y||2

C=n

yT (LC + tE)y

yT MC y, (1.52)

19

INTRODUCTION

where

E = NT

C

[

− In+1

0

0 1

]

NC

. (1.53)

Similar to the trust region problem, we prove that this problem has no duality gap.

20

Bibliography

[1] S. Agarwal, M.K. Chandraker, F. Kahl, D.J. Kriegman, and S. Belongie. Practicalglobal optimization for multiview geometry. In Proc. European Conf. on ComputerVision, pages 592–605, Graz, Austria, 2006.

[2] Bazaraa, Sherali, and Shetty. Nonlinear Programming, Theory and Algorithms. Wiley,1993.

[3] P.J. Besl and N.D. McKay. A method for registration two 3-d shapes. IEEE Trans.Pattern Analysis and Machine Intelligence, 14(2):232–256, 1992.

[4] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press,2004.

[5] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization viagraph cuts. IEEE Trans. Pattern Analysis and Machine Intelligence, 23(11):1222–1239, 2001.

[6] R.J. Campbell and P.J. Flynn. A survey of free form object representation andrecognition. Computer Vision and Image Understanding, 81:166–210, 2001.

[7] Y. Chen and G. Medioni. Object modeling by registration of multiple range images.In International Conference on Robotics and Automation, volume 3, pages 2724–2729, 1991.

[8] Timothee Cour and Jianbo Shi. Solving markov random fields with spectral relax-ation. In Proceedings of the Eleventh International Conference on Artificial Intelligenceand Statistics, volume 11, 2007.

[9] N. Gelfand, L. Ikemoto, S. Rusinkiewicz, and M. Levoy. Geometrically stablesampling for the icp algorithm. In 3D Digital Imaging and Modeling (3DIM 2003),2003.

[10] R. Hartley and F. Schaffalitzky. L∞ minimization in geometric reconstruction prob-lems. In Proc. Conf. Computer Vision and Pattern Recognition, pages 504–509. 2004.

[11] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cam-bridge University Press, 2004. Second Edition.

21

INTRODUCTION

[12] Berthold K.P. Horn, Hugh M. Hilden, and Sharriar Negahdaripour. Closed-formsolution of absolute orientation using ortonormal matrices. Journal of the OpticalSociety of America A, 5:1127–1135, 1988.

[13] H. Zha J. Park and R. Kasturi. Spectral clustering for robust motion segmentation.In European Conf. Computer Vision, Prague, Czech Republic, 2004.

[14] A.E. Johnson and M. Herbert. Surface registration by matched oriented points.In First International Conference on Recent Advances in 3-D Digital Imageing andModelling, 1997.

[15] F. Kahl and D. Henrion. Globally optimal estimates for geometric reconstructionproblems. In Int. Conf. Computer Vision, pages 978–985, Beijing, China, 2005.

[16] Fredrik Kahl. Multiple view geometry and the L∞-norm. In International Confer-ence on Computer Vision, pages 1002–1009. Beijing, China, 2005.

[17] Q. Ke and T. Kanade. Quasiconvex optimization for robust geometric reconstruc-tion. In International Conference on Computer Vision, pages 986 – 993. Beijing,China, 2005.

[18] Q. Ke and T. Kanade. Uncertainty models in quasiconvex optimization for geomet-ric reconstruction. In Proc. Conf. Computer Vision and Pattern Recognition, pages1199 – 1205. New York City, USA, 2006.

[19] J. Keuchel, C. Schnörr, C. Schellewald, and D Cremers. Binary partitioning, per-ceptual grouping, and restoration with semidefinite programming. IEEE Trans. onPattern Analysis and Machine Intelligence, 25(11):1364–1379, 2006.

[20] P. Kohli, P. Kumar, and P.H.S. Torr. P3 & beyond: Solving energies with higher or-der cliques. Proceedings IEEE Conference of Computer Vision and Pattern Recognition,pages 147–159, 2007.

[21] Vladimir Kolmogorov and Ramin Zabih. What energy functions can be mini-mizedvia graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence,26(2):147–159, 2004.

[22] A. Levin, A. Rav-Acha, and D. Lischinski. Spectral matting. In Proc. Conf. ComputerVision and Pattern Recognition, Minneapolis, USA, 2007.

[23] J.J. Moré and D.C. Sorensen. Computing a trust region step. SIAM J. Sci. Stat.Comput., 4(3):553–572, 1983.

[24] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and analgorithm. In Advances in Neural Information Processing Systems 14, 2002.

22

[25] S. Poljak, F. Rendl, and H. Wolkowicz. A recipe for semidefinite relaxation for(0,1)-quadratic programming. Journal of Global Optimization, 7:51–73, 1995.

[26] L. Quan and Z. Lan. Linear n ≤ 4-point camera pose determination. IEEE Trans.Pattern Analysis and Machine Intelligence, 21(8):774–780, August 1999.

[27] F. Rendl and H. Wolkowicz. A semidefinite framework for trust region subproblemswith applications to large scale minimization. Math. Prog., 77(2 Ser.B):273–299,1997.

[28] M. Rojas, S.A. Santos, and D.C. Sorensen. A new matrix-free algorithm for thelarge-scale trust-region subproblem. SIAM Journal on optimization, 11(3):611–646,2000.

[29] M. Rojas, S.A. Santos, and D.C. Sorensen. Lstrs: Matlab software for large-scaletrust-region subproblems and regularization. Technical Report 2003-4, Departmentof Mathematics, Wake Forest University, 2003.

[30] C. Schellewald and C. Schnörr. Probabilistic subgraph matching based on convexrelaxation. In Proc. Int. Conf. on Energy Minimization Methods in Computer Visionand Pattern Recognition, pages 171–186, 2005.

[31] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. PatternAnalysis and Machine Intelligence, 22(8):888–905, 2000.

[32] K. Sim and R. Hartley. Recovering camera motion using the L∞-norm. In Proc.Conf. Computer Vision and Pattern Recognition, pages 1230–1237. New York City,USA, 2006.

[33] K. Sim and R. Hartley. Removing outliers using the L∞-norm. In Proc. Conf.Computer Vision and Pattern Recognition, pages 485–492. New York City, USA,2006.

[34] D.C. Sorensen. Newton’s method with a model trust region modification. SIAMJournal on Nomerical Analysis, 19(2):409–426, 1982.

[35] D.C Sorensen. Minimization of a large-scale quadratic fuction subject to a sphericalconstraint. SIAM J. Optim., 7(1):141–161, 1997.

[36] P.H.S. Torr. Solving markov random fields using semi definite programming. InNinth International Workshop on Artificial Intelligence and Statistics, 2003.

[37] S. Umeyama. An eigendecomposition approach to weighted graph matching prob-lems. IEEE Trans. Pattern Anal. Mach. Intell., 10(5):695–703, 1988.

[38] S. Yu and J. Shi. Multiclass spectral clustering. In International Conference onComputer Vision. Nice, France, 2003.

23

PAPER I

Submitted to Pattern Analysis and Machine Intelligence, 2007.

25

Branch and Bound Methods forEuclidean Registration Problems

Carl Olsson, Fredrik Kahl and Magnus Oskarsson

Abstract

In this paper we propose a practical and efficient method for finding the globally optimalsolution to the problem of determining the pose of an object. We present a frameworkthat allows us to use both point-to-point, point-to-line and point-to-plane correspon-dences for solving various types of pose and registration problems involving Euclidean (orsimilarity) transformations. Traditional methods such as the iterative closest point algo-rithm, or bundle adjustment methods for camera pose, may get trapped in local minimadue to the non-convexity of the corresponding optimization problem.

Our approach of solving the mathematical optimization problems guarantees globaloptimality. The optimization scheme is based on ideas from global optimization theory,in particular, convex under-estimators in combination with branch and bound methods.We provide a provably optimal algorithm and demonstrate good performance on bothsynthetic and real data. We also give examples of where traditional methods fail due tothe local minima problem.

1 Introduction

A frequently occurring, and by now a classical problem in computer vision, robotic ma-nipulation and photogrammetry is the registration problem, that is, finding the transfor-mation between two coordinate systems, see [20, 9, 17]. The problem appears in severalcontexts: relating two stereo reconstructions, solving the hand-eye calibration problemand finding the absolute pose of an object, given 3D measurements. A related problemis the camera pose problem, that is, finding the perspective mapping between an objectand its image. In this papper, we will develop algorithms for computing globally optimalsolutions to these registration problems within the same framework.

There are a number of proposed solutions to the registration problem and perhapsthe most well-known is by Horn et al [10]. They derive a closed-form solution for the

27

PAPER I

Euclidean (or similarity) transformation that minimizes the sum of squares error betweenthe transformed points and the measured points. As pointed out in [11], this is not anunbiased estimator if there are measurement errors on both point sets.

The more general problem of finding the registration between two 3D shapes wasconsidered in [3], where the iterative closest point (ICP) algorithm was proposed to solvethe problem. The algorithm is able to cope with different geometric primitives, suchas point sets, line segments and different kinds of surface representations. However, thealgorithm requires a good initial transformation in order to converge to the globally opti-mal solution, otherwise only a local optimum is attained. A number of approaches havebeen devoted to make the algorithm more robust to such difficulties, e.g. [6, 5], but thealgorithm is still plagued by local minima problems.

In this paper, we generalize the method of Horn et al [10] by incorporating point,line and plane features in a common framework. Given point-to-point, point-to-line, orpoint-to-plane correspondences, we demonstrate how the transformation (Euclidean orsimilarity) relating the two coordinate systems can be computed based on a geometricallymeaningful cost function. However, the resulting optimization problem becomes muchharder - the cost function is a polynomial function of degree four in the unknowns andthere may be several local minima. Still, we present an efficient algorithm that guaran-tees global optimality. A variant of the ICP-algorithm with a point-to-plane metric waspresented in [5] based on the same idea, but it is based on a local, iterative optimizationmethod.

The camera pose problem has been studied for a long time, that is estimating thecamera pose given 3D points and corresponding image points. The minimal amountof data required to solve this problem is three point correspondences and for this casethere may be up to four solutions. The result has been shown a number of times, butthe earliest solution is to our knowledge due to Grunert, already in 1841, [7]. A goodoverview of the minimal solvers and their numerical stability can be found in [8]. Givenat least six point correspondences the predominant method to solve the pose estimationproblem is by using the DLT algorithm as a linear starting solution for a gradient descentmethod, see [9, 19, 17]. There are also quasi-linear methods that give a unique solutiongiven at least four point correspondences, see [15, 21]. These methods solve a number ofalgebraic equations, and do not minimize the reprojection error. Given a Gaussian noisemodel for the image measurements and assuming i.i.d. noise, the Maximum Likelihood(ML) estimate is computed by minimizing the norm of the reprojection errors. To ourknowledge, there are no previous algorithms that have any guarantee of obtaining theglobally optimal ML solution.

The algorithms presented in this paper are based on relaxing the original non-convexproblems by convex under-estimators and then using branch and bound to focus in onthe global solution [12]. The under-estimators are obtained by replacing bilinear terms inthe cost function with convex and concave envelopes, see [16] for further details. Branchand bound methods have been used previously in the computer vision literature. For

28

2. A GENERAL PROBLEM FORMULATION

example, in [1] various multiple view geometry problems are considered in a fractionalprogramming framework.

In summary, our main contributions are:

• A generalization of Horn’s method for the registration problem using points, linesand planes.

• An algorithm for computing the global optimum of the corresponding quarticpolynomial cost function.

• An algorithm for computing the global optimum to the camera pose problem.

• The introduction of convex and concave relaxations of monomials in the computervision literature. This opens up the possibility of attacking similar problems forwhich so far only local algorithms exist.

Although we only give experimental results for point, line and plane correspondences, itis shown that our approach is applicable to more general settings. In fact, correspondencefeatures involving any convex sets can be handled. The work in this paper is based on theconference papers [14, 13].

2 A General Problem Formulation

We will now make a general formulation of the problem types we are interested in solving.The stated problem is to find a transformation relating two coordinate systems; the modelcoordinate system and the measurement coordinate system. In the measurement systemwe have points that we know should map to some convex sets in the model system. Typi-cally, in a practical situation, the convex sets consist of single points, lines or planes, but inprinciple, they may be any convex set. We want to determine a transformation such thatthe mapping of the measurement points come as close to their corresponding convex setsas possible. The transformation is usually a similarity or a Euclidean transformation. Wewill also consider the transformation consisting of a Euclidean transformation followedby a perspective mapping in the camera pose problem.

Let d be a norm. We then define the distance from a (closed) set C as

dC(x) = infy∈C

d(x, y). (1)

Let {xi}mi=1 be the point measurements and {Ci}

mi=1 be the corresponding model sets.

The registration (or pose) problem can then be written as

infT

m∑

i=1

dCi(T (xi)), (2)

29

PAPER I

Figure 1: The experimental setup for the tests done in Section 5.3.

30

3. GLOBAL OPTIMIZATION

where T is in the set of feasible transformations. In the case of a similarity transformT (x) = sRx + t, with s ∈ R+, R ∈ SO(3) and t ∈ R

3. (s = 1 corresponds to theEuclidean case). Our goal is to find the global optimum of problem (2). Note that thesets Ci do not have to be distinct. If two points xi and xj map to the same set then wetake Ci = Cj .

The following theorem shows why it is convenient to only work with convex sets.

Theorem 2.1. dC is a convex function if and only if C is a convex set.

Proof. Since C can be written as the zero level set { x ; dC(x) ≤ 0 }, C must be convexif dC is convex.Next assume that C is convex. Let z = λx + (1− λ)y for some scalar λ ∈ [ 0 , 1 ]. Pickx′, y′, z′ ∈ ∂C such that

dC(x) = infu∈C

d(x, u) = d(x, x′) (3)

dC(y) = infu∈C

d(y, u) = d(y, y′) (4)

dC(z) = infu∈C

d(z, u) = d(z, z′). (5)

Since d is a norm and z′ minimizes (5) and λx′ + (1 − λ)y′ ∈ C we obtain

dC(λx + (1 − λ)y) = d(z, z′) ≤ d(z, λx′ + (1 − λ)y′)

≤ λd(x, x′) + (1 − λ)d(y, y′)

= λdC(x) + (1 − λ)dC(y). (6)

The inequality d(z, z′) ≤ d(z, λx′ + (1 − λ)y′) is true since z′ minimizes d(z, ·).

Theorem 2.1 shows that if T depends affinely on some parameters then problem (2) isa convex optimization problem in these parameters, which can be solved efficiently usingstandard methods. This is however not the case for Euclidean and similarity transforma-tions. In these cases (2) becomes highly non convex due to the orthogonality constraintsRT R = I . Instead we will use larger sets of transformations, that can be parameterizedas affine combinations of some parameters, and use (2) to obtain lower bounds. Thesebounds can then be used to find the global optimum in a branch and bound scheme.Note that the convexity of dC does not depend on the particular choice of d, any choiceof norm will do.

3 Global Optimization

In this section we briefly present some notation and concepts of convex optimization andbranch and bound algorithms. This is standard material in (global) optimization theory.For a more detailed introduction see [4] and [12].

31

PAPER I

3.1 Convex Optimization

A convex optimization problem is a problem of the form

min g(x)such that hi(x) ≤ 0, i = 1, ..., m.

(7)

Here x ∈ Rn and both the cost (or objective) function g(x) : R

n 7→ R and the con-straint functions hi(x) : R

n 7→ R are convex functions. Convex problems have thevery useful property that a local minimizer to the problem is also a global minimizer.Therefore it fits naturally into our framework.

The convex envelope of a function h : S 7→ R (denoted hconv) is a convex functionwhich fulfills:

1. hconv(x) ≤ h(x), ∀x ∈ S.

2. If u(x) is convex on S and u(x) ≤ h(x), ∀x ∈ S then hconv(x) ≥ u(x), ∀x ∈ S.

Here S is a convex domain. The concave envelope is defined analogously. The convexenvelope of a function has the nice property that it has the same global minimum as theoriginal function. However computing the convex envelope usually turns out to be justas difficult as solving the original minimization problem.

3.2 Branch and Bound Algorithms

Branch and bound algorithms are iterative methods for finding global optima of non-convex problems. They work by calculating sequences of provable lower bounds whichconverge to the global minimum. The result of such an algorithm is an ǫ−suboptimalsolution i.e. a solution that is at most ǫ from the global minimum. By setting ǫ smallenough, we can get arbitrarily close to the global optimum.

Consider the following (abstract) problem. We want to minimize a non-convex scalar-valued function f(t) over a set D0. One may assume that D0 is closed and compact soa minimum is attained within the domain. For any closed set D ⊂ D0, let fmin(D) bethe minimum value of f on D and let flower(D) be a lower bound for f on D. Alsowe require that the approximation gap fmin(D) − flower(D) goes uniformly to zero asmaxx,y∈D ||x − y|| (denoted |D|) goes to zero. Or in terms of (ǫ, δ), we require that

∀ǫ > 0, ∃δ > 0 s.t. ∀D ⊂ D0,

|D| ≤ δ ⇒ fmin(D) − flower(D) ≤ ǫ.

If such a lower-bounding function flower can be constructed then a strategy to obtainan ǫ-suboptimal solution is to divide the domain into sets Di with sizes |Di| ≤ δ andcompute flower in each set. However the number of such sets increases quickly with1/δ and therefore this may not be feasible. To avoid this problem a strategy to create

32

4. CONVEX RELAXATIONS OF SO(3)

as few sets as possible can be deployed. Assume that we know that fmin(D0) < k forsome scalar k. If flower(D) > k for some set D then there is no point in refining Dfurther since the global minimum will not be contained in D. Thus D and all possiblerefinements Di ⊆ D can be discarded.

The branch and bound algorithm begins by computing flower(D0) and the actualpoint q∗ ∈ D0 which is the minimizer. This is our current best estimate of the min-imum. If f(q∗) − flower(D0) ≤ ǫ then q∗ is ǫ-suboptimal and the algorithm termi-nates. Otherwise the set D0 is partitioned into subsets (D1, ..., Dk) with k ≥ 2 andone gets the lower bounds flower(Di) and the points qi in which these bounds are ob-tained. The new best estimate of the minimum is then q∗ := argmin{qi}k

i=1

f(qi). If

f(q∗) − min1≤i≤k flower(Di) ≤ ǫ, then q∗ is ǫ-suboptimal and the algorithm termi-nates. Otherwise the subsets are refined further, however, the sets for which flower(Di) >f(q∗) can be discarded immediately and need not be considered for further computa-tions. This algorithm is guaranteed to find an ǫ-suboptimal solution for any ǫ > 0. Thedrawback is that the worst case complexity is exponential. In practice one can achieverelatively fast convergence if the obtained bounds are tight.

In most cases we do not want to minimize the function over the entire domain, butrather to a subset. Then we also need to check if the problem is feasible in each set Di -if it is not, the set can be removed from further consideration.

4 Convex Relaxations of SO(3)

In this section we derive relaxations of SO(3) and the group of similarity transformationsthat allows us to parametrize the transformations affinely. The goal is to obtain easilycomputable lower bounding functions for cost functions involving Euclidean transfor-mations.

Recall from Section 2 that problem (2) is convex if T depends affinely on its pa-rameters. A common way to parametrize rotations is to use quaternions (see [2]). Letq = (q1, q2, q3, q4)

T , with ||q|| = 1 be the unit quaternion parameters of the rotationmatrix R. If the scale factor s is free to vary, one can equivalently drop the condition||q|| = 1. We note that for a point x in R

3, we can rewrite the 3-vector sRx as a vectorcontaining quadratic forms in the quaternion parameters in the following way

sRx =

qT B1(x)qqT B2(x)qqT B3(x)q

(8)

where Bl(x), l = 1, 2, 3 are 4 × 4 matrices whose entries depend on the elements of thevector x.

In order to be able to use the branch and bound algorithm, it is required that we havebounds on q such that qL

i ≤ qi ≤ qUi , i = 1, 2, 3, 4. In practice these bounds are usually

33

PAPER I

known since the scale factor can not be arbitrarily large. The quadratic forms qT Bl(x)qcontain terms of the form blij(x)qiqj , and therefore we introduce new variables sij =qiqj , i = 1, .., 4, j = 1, ..., 4 in order to linearize the problem.

Now consider the constraints sij = qiqj , or equivalently

sij ≤ qiqj , (9)

sij ≥ qiqj . (10)

In the new variables sij , the cost function becomes convex, cf. (2), but the domain ofoptimization given by the above constraint functions is not a convex set.

The constraints (10) are convex if i = j and (9) is not convex for any i, j. If wereplace qiqj in (9) with the concave envelope of qiqj then (9) will be a convex condition.We also see that by doing this we expand the domain for sij and thus the minimum forthis problem will be lower or equal to the original problem. Similarly we can relax qiqj

in (10) by its convex envelope and obtain a convex problem which gives a lower boundon the global minimum of the cost function (2).

The convex envelope of qiqj , i 6= j, qLi ≤ qi ≤ qU

i , qLj ≤ qj ≤ qU

j , is well known(e.g., [16]) to be

(qiqj)conv = max

{

qiqUj + qU

i qj − qUi qU

j

qiqLj + qL

i qj − qLi qL

j

}

≤ qiqj , (11)

and the concave envelope is

(qiqj)conc = min

{

qiqLj + qU

i qj − qUi qL

j

qiqUj + qL

i qj − qLi qU

j

}

≥ qiqj . (12)

Thus the equations (9) and (10) for i 6= j can be relaxed by the linear constraints

−sij + qiqUj + qU

i qj − qUi qU

j ≥ 0, (13)

−sij + qiqLj + qL

i qj − qLi qL

j ≥ 0, (14)

sij − (qiqLj + qU

i qj − qUi qL

j ) ≥ 0, (15)

sij − (qiqUj + qL

i qj − qLi qU

j ) ≥ 0. (16)

If i = j we need to relax q2i in (9) with its concave envelope. However this is simply a

line aqi + b, where a and b are determined by noting that the values (qLi )2 and (qU

i )2

should be attained at the points qi = qLi and qi = qU

i , respectively. Figure 2 shows theupper and lower bounds of s11 when −1 ≤ q1 ≤ 1. We see that even when the intervalhas only been divided four times the upper bound is quite close to the lower bound.This gives some indication on how the lower bounds on the problem may converge quiterapidly. Since all the non-convex constraints have been replaced by convex constraints,the relaxed problem is now convex. Thus we can minimize this problem to obtain a lower

34

4. CONVEX RELAXATIONS OF SO(3)

−1 0 1

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

q1

s 11

−1 0 1

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

q1

s 11

Figure 2: Upper and lower bounds of s11, which relaxes q21 in the interval [−1 , 1 ]. Left:

the initial bound. Right: when the interval has been subdivided four times. Note thatthe lower bound is exact since q2

1 ≤ s11 is convex.

bound on the original problem and as we subdivide the domain these lower bounds willtend to the global minimum of the original problem.

To summarize we now state the relaxed problem which will serve as a bounding func-tion in our branch and bound algorithm. For now we restrict T to be the set of Euclidean(or similarity) transformations, that is, x 7→ Rx + t where R denotes a rotation matrixand t a translation vector. In a later section we will generalize this to include perspectivemappings in order to solve the camera pose problem. Let q be a 14× 1 vector containingthe parameters (q1, ..., q4, s11, s12, ..., s44). The quadratic forms qT Bl(xi)q in (8) canthen be relaxed by the terms bl(xi)

T q where bl(xi) is a vector of the same size as q whoseentries depend on xi. Note that entries of bl(xi) are obtained directly from the entries

of Bl(xi). The vector sRxi can now be written B(xi)q where B(xi) is a 3× 14 matrixwith rows bl(xi)

T . All the linear constraints in (13)-(16) as well as the bounds on thequaternion parameters qi can be written as cj − aT

j q ≥ 0 where cj is a constant andaj is a vector of same size as q. These constraints can be written as a matrix inequalityc−AT q ≥ 0 where c is a vector whose entries are the cj and A is a matrix whose columnsare the aj ’s. To improve the bound further we may also add the constraint that the sym-metric matrix S with entries sij in position (i, j) should be positive definite. (note that

35

PAPER I

sij = sji.) Then the relaxed problem can be written, cf. (2),

inf q,t

∑m

i=1 dC(B(xi)q + t)subject to sjj − q2

j ≥ 0, j = 1, . . . , 4,c − AT q ≥ 0,

S � 0.

(17)

In the case of finding a Euclidean transformation instead of a similarity transformationwe can simply add the extra linear constraint

s11 + s22 + s33 + s44 − 1 = 0. (18)

Thus, the original non-convex problem has been convexified by expanding the domainof optimization, and hence the optimal solution to the relaxed problem will give a lowerbound on the global minimum of the original problem.

Implementation

The implementation of the algorithm was done in Matlab. We basically used the algo-rithm described in Section 3.2 and the relaxations from Section 4. At each iteration,problem (17) is solved for all rectangles Di to obtain a lower bound on the function onDi. To speed up convergence we use the minimizer q∗i of the problem (17) as a startingguess for a local minimization of the original non-convex problem. We then compare thefunction-value at the local optimizer qloc

i and at q∗i . If any of these values are lower thanthe current best minimum the lowest one is deemed the new best minimum. Even thoughq∗i is the optimal value of the relaxed problem (17) it is often the case that the functionvalues are larger at q∗i than at qloc

i . Therefore we reach lower values faster if we use thelocal optimizer and thus intervals can be thrown away faster. If an interval is not thrownaway then we subdivide it into two. We divide the intervals along the dimension whichhas the longest side. In this way the worst case would be that the number of intervalsdoubles at each iteration. However we shall see later that in practice this is not the case.As a termination criterion we use the total 4-dimensional volume of the rectangles. Onecould argue that it would be sufficient to terminate when the approximation gap (see Sec-tion 3.2) is small enough, however this does not necessarily mean that the ǫ-suboptimalsolution is close to the real minimizer.

To solve the relaxed problem we used SeDuMi (see [18]). SeDuMi is a free add-onfor Matlab that can be used to solve problems with linear, quadratic and semi-definitenessconstraints. For the local minimization we used the built in Matlab function fmincon.

36

5. APPLICATIONS I: REGISTRATION USING POINTS, LINES AND/ORPLANES

5 Applications I: Registration Using Points, Lines and/or

Planes

We will now review the methods of Horn et al as presented in [10]. Given two cor-responding point sets we want to find the best transformation that maps one set ontothe other. The best is here taken to mean the transformation that minimizes the sum ofsquared distances between the points, i.e.

m∑

i=1

||T (xpi) − ypi

||22, (19)

where xpiand ypi

, i = 1, .., m, are the 3D points in the respective coordinate systems.Here we assume T to be either a Euclidean or a similarity transformation,

T (x) = sRx + t,

with s ∈ R+, R ∈ SO(3) and t ∈ R3 (s = 1 corresponds to the Euclidean case).

Following Horn [10], it is easily shown that the optimal translation t is given by

t =1

m

∑

ypi− R

1

m

∑

xpi= yp − Rxp. (20)

This will turn equation (19) for the Euclidean case into

m∑

i=1

||Rδxi − δyi||22 = (21)

=m∑

i=1

(δxi)T δxi + (δyi)

T δyi − 2(δyi)T Rδxi, (22)

with δxi = xpi− xp and δyi = ypi

− yp. Due to the orthogonality of R this expressionbecomes linear in R. Now R can be determined from the singular value decompositionof a matrix constructed from δxi and δyi. The details can be found in [10].

In this application we will consider not only point-to-point correspondences but alsopoint-to-line and point-to-plane correspondences. In the following sections it will beshown why the extension to these types of correspondences result in more difficult opti-mization problems. In Figure 1, a measurement device is shown which generates 3D pointcoordinates and which will be used for validation purposes. Table 1 describes differentmethods available for global optimization. Note that if the transformation considered isaffine, i.e. T (x) = Ax + t, then the problem is simply a linear least squares problem.

5.1 Point-to-Plane Correspondences

We will now consider the point-to-plane problem. Suppose we have a number of planesπi in one coordinate system and points xπi

i = 1, ..., mπ in another, and we assume that

37

PAPER I

Corresp. Euclidean/ Affinetype Similarity

Point-Point Horn [10] Linear Least Squares

Point-Plane Our algorithm Linear Least Squares

Point-Line Our algorithm Linear Least Squares

Combination Our algorithm Linear Least Squares

Table 1: Methods available for estimating the registration for different types of correspon-dences and transformations.

point xπilies on plane πi. Let dπ(x) be the minimum distance between a point x and a

plane π. The problem is now to find s ∈ R+, R ∈ SO(3) and t ∈ R3 that minimizes

fπ(s, R, t) =

mπ∑

i=1

(dπi(sRxπi

+ t))2. (23)

From elementary linear algebra we know that this can be written as

fπ(s, R, t) =

mπ∑

i=1

((sRxπi+ t − yπi

) · ni)2, (24)

where yπiis any point on the plane πi, ni is a unit normal of the plane πi and · is the

inner product in R3. Thus we want to solve the problem

mins∈R+,R∈SO(3),t∈R3

mπ∑

i=1

(

nTi (sRxπi

+ t − yπi))2

. (25)

In order to reduce the dimensionality of this problem we now derive an expression forthe translation t. This is similar to the approach by Horn et. al. (see [10]) in which allmeasurements are referred to the centroids.

If we let R be any 3× 3 matrix we can consider the problem (25) as minimizing (24)with the constraints gij(s, R, t) = rT

i rj − δij i, j = 1, 2, 3, where ri is the columns ofR and δij is the Dirac function. From the method of Lagrange multipliers we know thatfor a local minimum (s∗, R∗, t∗) (and hence a global) there must be numbers λi,j suchthat

∇fπ(s∗, R∗, t∗) +

3∑

i=1

3∑

j=1

λij∇gij(s∗, R∗, t∗) = 0. (26)

Here the gradient is taken with respect to all parameters. We see that the constraintis independent of the translation t, thus it will disappear if we apply the gradient with

38


respect to t. Moreover we see that we will get a linear expression in t and, thus we areable to solve for t. It follows that

t = N−1mπ∑

i=1

ninTi (yπi

− sRxπi), (27)

where N =∑mπ

j=1 njnTj .

Note that if N is not invertible then there are several solutions for t. However if tand t are two such solutions then their difference t = t − t is in the null space of Nthus

∑

njnTj t = 0. Now one can easily prove by inserting t into the cost function (24)

that fπ(s, R, t) = fπ(s, R, t). This means that there in this case are infinitely manysolutions and thus the problem is not well-posed.

Next we state the relaxed problem in the form similar to (17). In this case the distancefunction is dπi

(x) = (nTi x − yi)

2. If we parametrize the problem using quaternions asin (17) and introduce the relaxations in terms of the q paramameters we get

mπ∑

i=1

dπi(x) =

mπ∑

i=1

(nTi Biq)

2 =

mπ∑

i=1

(Biq)2 (28)

where Bi = nTi Bi is a 1 × 14 vector. Together with the constraints from (17) this gives

us our relaxed lower bounding problem.

5.2 Point-to-Line and Point-to-Point Correspondences

The case of point-to-line correspondences can be treated in a similar way as in the caseof point-to-plane correspondences. Let xli , i = 1, ..., ml, be the measured 3D pointsand let li, i = 1, ..., ml, be the corresponding lines. In this case the sum of the distancefunctions can be written as

dli(x) = ||(I − vivTi )(x − yli)||

2 (29)

where vi is a unit direction vector for the line li and yli is any point on the line li. Notethat the three components of (I−viv

Ti )(x−yli ) are linearly dependent since (I−viv

Ti )

is a rank 2 matrix. However it would not make sense to remove any of them since wewould then not be optimizing the geometrical distance any more. In the same way as inthe point-to-plane we can eliminate the t parameters and write our cost function in theform (28).

The case of point-to-point correspondences is the easiest one. Let xpibe the measured

points and ypibe the corresponding points i = 1, ..., mp. the distance function in just

dypi(x) = ||x − ypi

||2. (30)

Again our cost function can be put in the same form as (28).

39

PAPER I

When we have combinations of different correspondences we proceed in exactly thesame way. Note however that we have to add the cost functions before eliminating thet-variables since they will depend on all the correspondences. When doing this one getsthe expression for t as

t = M−1

(mp∑

i=1

(ypi−Rxpi

)+

ml∑

i=1

(I − vivTi )(yli −Rxli)+

mπ∑

i=1

ninTi (yπi

−Rxπi)

)

(31)where M = mpI +

∑ml

i=1(I − vivTi ) +

∑mπ

j=1 njnTj . Substituting this into the cost

function we can now find an expression of the type (28).

5.3 Experiments and Results

Local Minima

Non-convex problems usually exhibit local minima. To show some typical behavior ofthese kinds of functions we generated a problem with 8 plane-to-point correspondences.This was done in the following way. We randomly generated 8 planes πi and then pickeda point yπi

from each plane. Then the points xπi= RT (yπi

− t) were calculated forknown R and t. We used the Matlab built-in function fmincon to search for the minimafrom some different starting points. Table 2 shows the results. Note that the third point

local min point cost function value(0, -0.0781, -1.0000, -0.4857) 2.1365(0, 0.1189, -0.3349, -0.9316) 2.2161

(0.5917, 0.0416, 0.6995, 0.4088) 3.7123e-04(0.6551, 0.2226, 0.7166, 0.2306) 0.0018

(0,0,0,0) 65.1556

Table 2: Local minima found by the Matlab function fmincon.

is the global minimum. To get an idea of the shape of the function, Figure 3 plots thefunction-values along a line from one local minimum to the global minimum. To theleft is the values on the line from the first point in Table 2 and on the right is the secondpoint.

Iterations

Figure 4 shows the performance of the algorithm in two cases. In both cases the datahas been synthetically generated. The first problem is the case of ten point-to-plane,four point-to-line and four point-to-point correspondences. The solid line to the left inFigure 4 shows the number of feasible rectangles for this problem at each iteration. The

40


0 0.5 10

10

20

30

40

50

60

70

80

90

100

0 0.5 10

10

20

30

40

50

60

70

80

90

100

Figure 3: The function values on two lines between a local minimum and the globalminimum.

solid line to the right shows the fourth root of the total volume of the rectangles. Recallthat this is a 4-dimensional problem and therefore the fourth root gives an estimate ofthe total length of the sides in the rectangles. This case exhibits the typical behavior forthis algorithm. The second case is the minimal case of 7 point-to-plane correspondences.This seems to be the case where our algorithm has the most difficulties. The dashed linesshows the performance for this case. Note that this case could probably be solved muchmore efficiently with a minimal case solver, it is merely included to show the worst casebehavior of the algorithm. For comparison the dotted line to the left shows what thenumber of rectangles would be if no rectangles where thrown away. In the first case thealgorithm terminated after 38 iterations, and in the second after 39, which is a typicalbehaviour of the algorithm.

Experiments on Real Data

Next we present two experiments made with real data. The experimental setup canbe viewed in figure 1. We used a MicroScribe-3DLX 3d scanner to measure the 3D-coordinates of some points on two different objects. The 3D-scanner consists of a point-ing arm with five degrees of freedom and a foot pedal. It can be connected to the serial-port on a PC. To measure a 3D-coordinate one simply moves the pointing arm to thepoint and presses the pedal. The accuracy of the device is not very high. If one triesto measure the same point but varies the pose of the pointer one can obtain results thatdiffer by approximately half a millimeter. The test objects are the ones that are visible in

41

PAPER I

0 20 400

5

10

15

20

25

30

35

0 20 400

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Figure 4: Left, the number of feasible rectangles at each iteration for two experiments(solid and dashed lines, see text for details). For compairison the dotted line shows thetheoretical worst case preformance where no intervalls are discarded. Right, the fourthroot of the total volume of the rectangles.

figure 1, namely the Rubik’s cube and the toy model. By request of the designer will referto the toy model as the space station.

Rubik’s Cube Experiment. The first experiment was done by measuring on a Rubik’scube. The Rubik’s cube contains both lines planes and points and therefore suits ourpurposes. We modeled three of the sides of the cube and we measured nine point-to-plane, two point-to-line and three-point-to-point correspondences on these sides. Figure5 shows the model of the Rubik’s cube and the points obtained when applying the esti-mated transformation to the measured data points. The points marked with crosses arepoints measured on the planes, the points marked with rings are measured on lines andthe points marked with stars are measured on corners. It is difficult to see from this pic-ture how well the points fit the model, however to the left in figure 6 we have plotted theresidual errors of all the points. Recall that the residuals are the squared distances. Thefirst nine are the point-to-plane, the next two are the point-to-line and the last four arethe point-to-point correspondences. We see that the point-to-point residuals are some-what larger than the rest of the residuals. There may be several reasons for this, one isthat using the 3D-scanner it is much harder to measure corners than to measure planesor lines. This is because a corner is relatively sharp and thus the surface to apply thepointer to is quite small making it easy to slip. To the right in figure 6 is the result froma leave-one-out test. Each bar represents an experiment where we leave one data point

42


0

1

2

3

01

23

0

1

2

3

Figure 5: The model of the Rubik’s cube.

out, calculate the optimal transformation and measure the residual of the left out point.Again the first nine are the point-to-plane, the next two are the point-to-line and the lastfour are the point-to-point correspondences.

For comparison we also implemented the algorithm by Horn et. at. [10] and analgorithm based on linear least squares. The last algorithm first finds an optimal affinetransformation y = Ax + b and then finds sR by minimizing ||A− sR||F where || · ||Fis the frobenius norm. The translation is then calculated from equation (31). Note thatin order to calculate the affine transformation one needs at least 12 equations. Table 3shows the results obtained from the three methods when solving the Rubik’s cube experi-ment. The residuals stated are the sum of the different types of correspondence residuals.Note that this experiment is somewhat unfair to the algorithm by Horn et. at. since itonly optimizes the point-to-point correspondences. However due to the lack of otheralternatives we still use it for comparison. As one would expect the solution obtained byHorn’s algorithm has a lower residual sum for the point-to-point correspondences, sincethis is what it optimizes. Our algorithm has a lower total residual sum since this is whatwe optimize.

Space Station Experiment. The next experiment was done by measuring on the spacestation. It is slightly more complicated than the Rubik’s cube and it contains more planesto measure from. We measured 27 point-to-plane, 12 point-to-line and 10-point-to-point correspondences on the space station. Figure 7 shows the model of the space stationand the points obtained when applying the estimated transformation to the measured data

43

PAPER I

0 5 10 150

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8x 10

−3

0 5 10 150

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8x 10

−3

Figure 6: Left, Residual errors for all correspondences. Right, Leave-one-out residuals.(see text for details)

Residuals: Our Alg. Horn Lin. least sq.point-point 0.0023 1.3e-04 0.0051point-line 4.7e-04 0.0016 0.0027

point-plane 0.0018 0.0095 0.0049Total 0.0045 0.0113 0.0127

Table 3: Residuals of the cube problem.

points.

To the left in figure 8 we have plotted the residual errors of all the points, and to theright are the results of the leave-one-out test.

Table 3 shows the results obtained from the three methods when solving the spacestation experiment. Note that in this case the algorithm by Horn et. at. [10] preformsbetter since we have included more point-to-point correspondences then in the Rubik’scube experiment. Again our algorithm has the lowest total residual sum, while Horn hasthe lowest point-to-point residual.

Experiments on Synthetic Data: A Comparison with Horn

For completeness we also include an experiment where the Euclidean version of the al-gorithm is tested against the Euclidean versions of Horn and the linear least squares.

44

6. APPLICATIONS II: CAMERA POSE USING POINTS

0

0.5

11.5

2

2.53

−1

−0.5

0

0.5

1

1.5

20

0.5

1

Figure 7: The model of the Space Station.

Residuals: Our Alg. Horn Lin. least sq.point-point 0.0083 0.0063 0.0221point-line 0.0018 0.0036 0.0015


Table 4: Residuals of the space station problem.

We artificially generated six point-to-point, three point-to-line and seven point-to-planecorrespondences. The results can be seen in Table 5. The residuals for the point-pointcorrespondences will be lower for Horn’s method as expected while the total error is largerthan our algorithm. By approximating the Eucldian transformation by an affine one, theproblem becomes a linear least squares problem. An Euclidean transformation can beobtained a posteriori from the affine estimate by taking the closest Euclidean transforma-tion. As can be seen in the table, the linear algorithm does ok, but the optimum is notobtained.

6 Applications II: Camera Pose Using Points

One of the basic problems both in computer vision and in photogrammetry is the camerapose estimation problem. Given a number of correspondences between points in 3D

45

PAPER I

0 20 400

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

−3

0 20 400

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

−3

Figure 8: Left, Residual errors for all correspondences. Right, Leave-one-out residuals.(see text for details)

Residuals: Our Alg. Horn Lin.Least Sq.point-point 0.0286 0.0220 0.0316point-line 6.0630e-04 0.4986 0.0072


Table 5: Residuals of the simulated data for the Euclidean algorithm.

space and the images of these points, the pose estimation problem consists of estimatingthe rotation and position of the camera. Here the camera is assumed to be calibrated. Atypical application is shown in Figure 9, where the object is to relate a camera to a scanned3D model of a chair.

Given m world points Xi (represented by 3-vectors) and corresponding image points

xi =[

x1i x2

i

]T, we want to find the camera translation t and rotation R that minimizes

f :

f(R, t) =

m∑

i=1

d(xi, π(Xi))2 =

m∑

i=1

(

(

x1i −

rT1 Xi + t1

rT3 Xi + t3

)2

+

(

x2i −

rT2 Xi + t2

rT3 Xi + t3

)2)

,

(32)where π(·) is the perspective projection with RT =

[

r1 r2 r3

]

and tT =[

t1 t2 t3]

.We will in the next section show how to derive a branch and bound algorithm that

46


Figure 9: Chair experiment with 3D-scanner.

finds the global optimum for (32).

6.1 Parametrization

Recall that the problem is to find a rotation R and a translation t such that (32) is min-imized which is similar to the registration problem in Section 5. Using unit quaternionsfor parametrizing the rotation matrices, the cost function (32) can be rewritten as

f(q, t) =2m∑

i=1

(

qT Aiq + aTi t

qT Biq + bTi t

)2

, (33)

where Ai and Bi are 4 × 4-matrices, ai and bi 3 × 1-vectors, determined by the datapoints xi and Xi. This is a non-convex rational function in seven variables.

To obtain lower bounds on this function we proceed by formulating a convex op-timization problem for which the solution gives a lower bound. The quadratic formsqT Aiq + aT

i t and qT Biq + bTi t contain terms of the form qiqj . We therefore introduce

the new variables sij = qiqj , analogously to (9) and (10). In turn, these non-convexconstraints can be relaxed (convexified) using convex and concave envelopes. The terms(qT Aiq + aT

i t)2 and (qT Biq + bTi t)2 in the cost function (33) can now be rewritten

as (Aiq)2 and (Biq)

2 respectively. Here q is a 13 × 1 vector containing the parameters

(t1, .., t3, s11, s12, ..., s44) and Ai, Bi are 1 × 13 matrices.It is well known that a rational function with a linear denominator and a quadratic

numerator of the form x2/y for y > 0 is convex. Therefore by replacing the denominator

with the affine function ci(Biqi)+di (that is, the concave envelope of (Biq)2), a convex

47

PAPER I

under-estimator of the original cost function is obtained. The constants ci and di aredetermined from the bounds on qi and ti. The full convex function is then

flower(q) =

2m∑

i=1

(Aiq)2

ci(Biq) + di

. (34)

Minimizing this function subject to the relaxation constraints on sij yields a lower boundon the cost function (32). At the same time, one can compute the actual value of the costfunction f . If the difference is small (less than ǫ), one can stop. As the intervals on qi,i = 1, . . . , 4 are divided into smaller ones, one can show that the lower bound flower

converges uniformly to the function (33), cf. [16].To simplify the problem further, we make the following modifications. Without loss

generality, one can choose the world coordinate system such that Xi =[

0 0 0]T

forsome i, since the cost function is independent of the world coordinate frame. Further, asthe cost function is a rational function of homogeneous quantities, one can dehomogenizeby setting t3 = 1. Thus, for point i we obtain

d(xi, π(Xi))2 = (x1

i − t1)2 + (x2

i − t2)2. (35)

Note that this is a convex function. Further, we can restrict the search space by enforcinga maximum error bound on the reprojection error for this point, say γmax pixels. This

results in bounds on tj such that xji − γmax < tj < xj

i + γmax for j = 1, 2.Since t3 can be geometrically interpreted as the distance from the camera centre to

the point Xi (along the optical axis), we have effectively normalized the depth to one.This effects the bounds on qi as well. Suppose a lower bound on t3 for the originalhomogeneous camera is tlow

3 , then the new bounds for the dehomogenized quaternionsbecome −1/tlow

3 ≤ qi ≤ 1/tlow3 for i = 1, . . . , 4. A conservative lower bound on t3

can easily be obtained by examining the distances between the given world points.

6.2 Experiments and Results

Local Minima

Even though the cost function is highly non-convex, one might ask if local minima ac-tually occur for realistic scenarios and if so, how often. Therefore, we first generatedrandom 3D points within the unit cube [−1 , 1 ]3 and a random camera (with principalpoint at the origin, unit focal length and skew zero) having a viewing direction toward theorigin at a distance of two units. Then, the projected image coordinates were perturbedwith independent Gaussian noise with different noise levels. To the left of Figure 10,a histogram of the number of local minima that occur for four points is plotted. Thelocal minima have been computed with random initializations (in total, 100 tries) andthe experiment has been repeated 1000 times. Note that all local minima have positive

48


1 2 30

10

20

30

40

50

His

tog

ram

(p

erc

en

t)

Number of local minima0 0.002 0.004 0.006 0.008 0.01

90

92

94

96

98

100

Noise level (pixels)

Op

tim

um

atta

ine

d (

pe

rce

nt)

Linear+bundle

Optimal

Figure 10: Local minima plots. Left: Histogram of local minima with four points. Right:The percentage of times the global optimum is attained for six points.

depths (that is, world points are in front of the camera). There are typically 4-6 additionallocal minima with negative depths. To the right of Figure 10, the percentage of times thecorrect global optimum is reached for six points is shown. For our algorithm (optimal),the global optimum is of course always obtained. The traditional way is to apply a linearalgorithm (DLT) which requires at least six points, and then do local refinements (bundleadjustment) [9]. As one can see, one might get trapped in a local minimum even for smallnoise levels.

Experiments on Real Data

The following two subsections present experiments made with real data, the chair and thedinosaur experiment.

Chair Experiment. The setup for the chair experiment can be viewed in Figure 9.We used a MicroScribe-3DLX 3d scanner to measure the 3D-coordinates of the blackpoints on the chair. For the first experiment we took three images of the chair and usedthe images of the 14 scanned 3D points to calculate the rotation and translation usingour method. The intrinsic camera calibration was computed with standard techniques[9]. The reconstructed cameras are shown in Figure 11. Using the images and the re-constructed cameras, it is easy to get a textured 3D model from the scanned model, seeFigure 11. Table 6 shows the resulting reprojection error measured in pixels. For thisparticular camera the resolution was 1360 × 2048 and the focal length was 1782. Forcomparison we also tried to solve the problem with a linear method with and withoutbundle adjustment. The linear method first calculates a projective 3× camera matrix P ,

49

PAPER I

(a) (b)

Figure 11: (a) Reconstructed cameras and (b) resulting VRML model.

Residuals: Our Alg. Lin.Method Lin.+Bundlecamera 1 1.351 10.76 1.351camera 2 0.939 44.60 10.01camera 3 0.950 4.741 0.950

Table 6: The RMS reprojection error measured in pixels, obtained when using all 14points.

and then use SVD factorization to find the closest camera matrix such that P = [R | t]where R is a scaled rotation matrix. As expected the linear method without bundle per-forms poorly. Also note that for the second chair image the bundle adjustment yields alocal minimum. In Figure 12 we illustrate the difference between the solutions obtainedwhen not using all points. Here we regard the solution obtained when using all 14 pointsas the true solution. In Figure 12 we have plotted the angles between the principal axis ofthe resulting camera matrix and the principal axis of the true solution. The red bars arethe results when using 4 to 13 points in the first image, the black the same when using thesecond image, and the yellow is from the third image. Note that the computed solutionis very similar already from 4 points to that of using all 14 points.

To illustrate the convergence of the algorithm, Figure 13 shows the performance ofthe algorithm for the two first images of the chair. To the left we have plotted the numberof feasible rectangles at each iteration for the first chair image using 6 (red), 9 (green) and12 (blue) points. To the right is the same plot for the second chair image.

50

7. CONCLUSIONS AND FUTURE WORK

0

0.2

0.4

0.6

0.8

1

1.2

1.4

cam 1cam 2cam 3

Figure 12: Angles in degrees between the principal axis of the optimal solution with 14points and the solutions with 4-13 points, for the three chair experiments.

Dinosaur Experiment. To further demonstrate the robustness of the algorithm, wehave tested the algorithm on the publicly available turntable sequence of a dinosaur, seeFigure 14 for one of the 36 images. The full reconstruction of 3D points and cameramotion are also available, obtained by standard structure and motion algorithms [9]. Foreach of the 36 views, we have taken 4 randomly chosen points visible in that imageand then estimated the camera pose. The resulting camera trajectory including viewingdirection is compared to the original camera motion in Figure 14 (right). Note the eventhough only four points have been used, the camera motion (blue curve) is very close tothat of full bundle adjustment using all points (red curve).

7 Conclusions and Future Work

Optimization over the manifold of Euclidean transformations is a common problem formany applications. The resulting mathematical optimization problems are typically non-convex and hard to solve in general. In this paper, we have shown how global optimizationtechniques can be applied to yield ǫ-suboptimal solutions where ǫ can be arbitrarily small.

Future work includes to investigate degenerate cases and the use of robust norms toimprove the general applicability of the approach. In addition, the performance of thealgorithm should be tested on a wider range of experiments. Another natural path forfurther investigation is to incorporate the methodology in the ICP algorithm for generalshapes in order to improve the robustness with respect to local minima.

51

PAPER I

0 20 400

2

4

6

8

10

iterations

nr. o

f int

erva

ls

0 20 400

2

4

6

8

10

iterations

nr. o

f int

erva

ls

Figure 13: Convergence of the branch and bound algorithm. The number of feasiblerectangles at each iteration for two of the chair images using 6 (red), 9 (green) and 12(blue) points.

Figure 14: The recovered camera motion for the dino experiment. Camera motion fromfull bundle adjustment red curve and using only four points blue curve.

52

Bibliography

[1] S. Agarwal, M.K. Chandraker, F. Kahl, D.J. Kriegman, and S. Belongie. Practicalglobal optimization for multiview geometry. In Proc. European Conf. on ComputerVision, pages 592–605, Graz, Austria, 2006.

[2] S. Altmann. Rotations, Quaternions and Double Groups. Clarendon Press, 1986.

[3] P.J. Besl and N.D. McKay. A method for registration two 3-d shapes. IEEE Trans.Pattern Analysis and Machine Intelligence, 14(2):232–256, 1992.


[5] Y. Chen and G. Medioni. Object modeling by registration of multiple range images.In International Conference on Robotics and Automation, volume 3, pages 2724–2729, 1991.

[6] N. Gelfand, L. Ikemoto, S. Rusinkiewicz, and M. Levoy. Geometrically stablesampling for the icp algorithm. In 3D Digital Imaging and Modeling (3DIM 2003),2003.

[7] J. A. Grunert. Das pothenot’sche problem in erweiterter gestalt; nebst bemerkungenüber seine anwendung in der geodäsie. Grunert Archiv der Mathematik und Physik,1(3):238–248, 1841.

[8] R. M. Haralick, C. N. Lee, K. Ottenberg, and M. Nolle. Review and analysisof solutions of the 3-point perspective pose estimation problem. Int. Journal ofComputer Vision, 13(3):331–356, December 1994.


[10] Berthold K.P. Horn, Hugh M. Hilden, and Sharriar Negahdaripour. Closed-formsolution of absolute orientation using ortonormal matrices. Journal of the OpticalSociety of America A, 5, 1988.

[11] K. Kanatani. Unbiased estimation and statistical analysis of 3-d rigid motion fromtwo views. IEEE Trans. Pattern Analysis and Machine Intelligence, 15(1):37–50,1993.

53

[12] B Kolman and R. E. Beck. Elementary Linear Programming with Applications. Aca-demic Press, 1995.

[13] C. Olsson, F. Kahl, and M. Oskarsson. Optimal estimation of perspective camerapose. In Int. Conf. Pattern Recognition, pages 5–8, Hong Kong, China, 2006.

[14] C. Olsson, F. Kahl, and M. Oskarsson. The registration problem revisited: Optimalsolutions from points, lines and planes. In Proc. Conf. Computer Vision and PatternRecognition, volume I, pages 1206–1213, New York City, USA, 2006.

[15] L. Quan and Z. Lan. Linear n ≤ 4-point camera pose determination. IEEE Trans.Pattern Analysis and Machine Intelligence, 21(8):774–780, August 1999.

[16] Hong Seo Ryoo and Nikolaos V. Sahinidis. Analysis of bounds for multilinearfunctions. Journal of Global Optimization, 19:403–424, 2001.

[17] C.C. Slama, editor. Manual of Photogrammetry. American Society of Photogram-metry, Falls Church, VA, 4:th edition, 1984.

[18] Jos F. Sturm. Using sedumi 1.02, a matlab toolbox for optimization over symmetriccones. 1998.

[19] I. E. Sutherland. Sketchpad: A man-machine graphical communications system.Technical Report 296, MIT Lincoln Laboratories, 1963.

[20] E. H. Thompson. An exact linear solution of the problem of absolute orientation.15(4):163–179, 1958.

[21] B. Triggs. Camera pose and calibration from 4 or 5 known 3d points. In Proc. 8thInt. Conf. on Computer Vision, Vancouver, Canada, pages 278–284, 1999.

54

PAPER II

In Proc. International Conference on Computer Vision (ICCV), Rio deJaneiro, Brazil 2007.

55

An L∞ Approach to Structure andMotion Problems in 1D-Vision

Kalle Åström, Olof Enqvist, Carl Olsson, Fredrik Kahl,Richard Hartley

Abstract

The structure and motion problem of multiple one-dimensional projections of a two-dimensional environment is studied. One-dimensional cameras have proven useful inseveral different applications, most prominently for autonomous guided vehicles, butalso in ordinary vision for analysing planar motion and the projection of lines. Previousresults on one-dimensional vision are limited to classifying and solving minimal cases,bundle adjustment for finding local minima to the structure and motion problem andlinear algorithms based on algebraic cost functions.

In this paper, we present a method for finding the global minimum to the struc-ture and motion problem using the max norm of reprojection errors. We show howthe optimal solution can be computed efficiently using simple linear programming tech-niques. The algorithms have been tested on a variety of different scenarios, both realand synthetic, with good performance. In addition, we show how to solve the multiviewtriangulation problem, the camera pose problem and how to dualize the algorithm in theCarlsson duality sense, all within the same framework.

1 Introduction

Understanding of one-dimensional cameras is important in several applications. In [14]it was shown that the structure and motion problem using line features in the special caseof affine cameras can be reduced to the structure and motion problem for points in onedimension less, i.e. one-dimensional cameras.

Another area of application is vision for planar motion. It has been shown that ordi-nary vision (two-dimensional cameras) can be reduced to that of one-dimensional camerasif the motion is planar, i.e. if the camera is rotating and translating in one specific planeonly, cf. [5]. In another paper the planar motion is used for auto-calibration [1]. A typical

57

PAPER II

example is the case where a camera is mounted on a vehicle that moves on a flat plane or aflat road, or a fixed camera viewing an object moving on a plane, e.g. in traffic scenarios.

reflector

Figure 1: Left: A laser guided vehicle. Right: A laser scanner or angle meter.

A third motivation is that of autonomous guided vehicles, which are important com-ponents for factory automation. The navigation system uses strips of reflector tape, whichare put on walls or objects along the route of the vehicle, cf. [7]. The laser scanner mea-sures the direction from the vehicle to the beacons, but not the distance. This is theinformation used to calculate the position of the vehicle.

One of the key problems here is the structure and motion problem, also called simul-taneous localisation and mapping (SLAM). This is the procedure of obtaining a map ofthe unknown positions of the beacons using images at unknown positions and orienta-tions. This is usually done off-line, when the system is installed and then occasionally ifthere are changes in the environment. High accuracy is needed, since the precision of themap is critical for the performance of the navigation routines. In this article we present amethod to find the globally optimal solution to this structure and motion problem.

Previous results concerning 1D projections of 2D include solving minimal cases with-out [13, 14, 2, 4] and with [12] missing data, autocalibration [5] critical configurations[3] and structure and motion systems in general [11].

The paper is organized as follows. In Section 2 we review the method L∞ optimiza-tion. In Section 3 a brief introduction to the geometry of the problem is given. Section 4discusses the problems of resection and intersection showing that they can be solved ef-ficiently. An optimization method for the structure and motion problem is presentedin Section 5 along with the required theoretic results. Finally, Section 6 presents someexperiments illustrating the performance of the optimization method.

58

2. L∞ OPTIMIZATION

2 L∞ optimization

Many geometrical problems can be formulated as optimization problems. Consider forinstance the n-view triangulation problem with 2D-cameras. Statistically optimal esti-mates can be found by minimizing some error measure between the image data and thereprojected data. The usual choice of objective function is the L2-norm of the reprojec-tion errors, since this is the statistically optimal choice assuming independent Gaussiannoise of equal and isotropic variance. Since closed form solutions are rarely available forthese problems, they are often solved by iterative algorithms. The problem with this ap-proach is that these methods often depend on good initialization to avoid local minima.

To resolve this problem L∞ optimization was introduced in [6]. The idea is to mini-mize the maximal reprojection error instead of the L2-norm. In [8], [9] it was shown thatthe optimization problems obtained for a number of multiview-geometry problems usingthe L∞-norm are examples of quasiconvex problems. A function f is called quasiconvexif its sublevel sets Sµ(f) = {x; f(x) ≤ µ} are convex. The reason for using the L∞-norm when dealing with quasiconvex functions is that quasiconvexity is preserved underthe max operation. That is, if fi are quasiconvex functions then f(x) = maxi fi(x)is also a quasiconvex function. It was shown in [8], [9] that for a number of multiviewgeometry problems the (squared) reprojection errors are quasiconvex, and therefore theproblem of minimizing the maximum reprojection error is a quasiconvex problem.

A useful property of quasiconvex functions is that checking whether there is an xsuch that f(x) ≤ µ is a convex feasibility problem and can usually be solved efficiently.This gives a natural algorithm for minimizing a quasiconvex function. Suppose we havebounds µh and µl such that µl ≤ minx f(x) ≤ µh then a bisection algorithm forsolving minx f(x) is

1. µ = µh+µl

2 .

2. If there exists x fulfilling f(x) ≤ µ then µh = µ.Otherwise µl = µ.

3. If µh − µl > ǫ return to 1. Otherwise quit.

Although the L∞-norm is not statistically optimal it has been shown to give almost asgood solutions as the L2-norm ([8], [9]). The only real weakness is that it is sensitive tooutliers, but solutions to this problem has been presented in [15, 10].

3 Preliminaries

In one-dimensional vision only the bearing, α, of the beam from the camera to the ob-ject can be observed. In case of one-dimensional cameras this is measured using a laserscanner, and if the one-dimensional problem comes from ordinary vision, the bearingis calculated from higher-dimensional data. Only the bearing relative a fixed direction

59

PAPER II

x

y

(Px, Py)

(Ux, Uy)

Pθ

α(P, U)

Figure 2: The figure illustrates the measured angle α as a function of scanner position(Px, Py), scanner orientation Pθ and beacon position (Ux, Uy).

of the camera is measured so if the orientation of the camera is unknown, it has to beestimated as well.

We introduce an object coordinate system which will be held fixed with respect to thescene. The measured bearing of an object defined above, depends on the position of theobject point (Ux, Uy) and the position (Px, Py) and orientation Pθ of the camera.

α(P, U) = arg(Ux − Px + i(Uy − Py)) − Pθ , (1)

where arg is the complex argument (the angle of the vector (Ux −Px, Uy −Py) relativeto the positive x-axis). The vector (Px, Py, Pθ) is called the camera state.

The same equation can be rephrased as

λ

[cos(α)sin(α)

]

︸︷︷︸

u

=

[a b c−b a d

]

︸︷︷︸

P

Ux

Uy

1

︸︷︷︸

U

, (2)

where there is a one-to-one mapping between camera matrices P with variables (a, b, c, d)and camera states P = (Px, Py, Pθ). We will freely switch between these two represen-tations and will use non boldface α, P = (Px, Py, Pθ) and U = (Ux, Uy) for firstrepresentation and variables and boldface u, P and U to denote image and object pointsin homogeneous coordinates and camera matrices.

Note that all bearings should be considered modulo 2π. Addition and subtractionare defined in the normal ways. When calculating absolute values the bearings should berepresented with angles between −π and π. Also note that two solutions (UJ , PI) and(UJ , PI) to a problem are considered equal if they are related by a similarity transforma-tion.

60

4. INTERSECTION AND RESECTION

In this paper we are mostly interested in overdetermined problems, i.e. problems inwhich we have more measurements α than degrees of freedom. For such problems theequations α(PI , UJ) = α cannot be satisfied exactly. Instead we are forced to solve aminimization problem. Motivated by the previous section we choose to minimize theL∞ norm of the error.

4 Intersection and resection

Before moving on to the structure and motion problem, which is the main subject of thispaper, we consider the simpler problems of intersection and resection.

Consider a number of cameras seeing the same object. If the position and orientationof the cameras is known, the object is to determine the position of the object. This iscalled the intersection problem.

Problem 4.1. Given bearings α1, . . . , αm from m different positions P1, . . . , Pm the L∞

intersection problem is to find reconstructed point U such that

f∞

i (U) = maxI

|α(PI , U) − αI | (3)

is minimal.

If instead, the positions of a number of objects are known, the goal is to determine theposition and orientation of the camera seeing those objects. This is the resection problem.

Problem 4.2. Given n bearings α1, . . . , αn and points U1, . . . , Un the L∞ resectionproblem is to find the camera state P such that

f∞

r (P ) = maxJ

|α(P, UJ ) − αJ | (4)

is minimal.

These two problems are in a sense easy to solve. We shall see that both of them canbe formulated as quasiconvex problems.

Lemma 4.1. The function

f∞

i (U) = maxI

|α(PI , U) − αI |, (5)

is quasiconvex on the set {U;uI · PIU > 0, ∀I}, and the function

f∞

r (P ) = maxJ

|α(P, UJ ) − αJ |. (6)

is quasiconvex on the set {P;uJ · PUJ > 0, ∀J}.

61

PAPER II

Proof. For given U,P and corresponding u we have

u × PU

u · PU=

|u||(PU)| sin(α − α)

|u||(PU)| cos(α − α)= tan(α − α). (7)

Here a × b denotes the scalar a1b2 − a2b1. Since u · PU > 0, checking whether|α − α| ≤ ∆ is equivalent to

|u× PU| ≤ tan(∆)(u · PU) (8)

In the intersection case u and P are known and in the resection case u and U areknown. Therefore in both cases (8) constitute two linear equations. Hence the sublevelsets {U; f∞(U) ≤ ∆} and {P; f∞(P ) ≤ ∆} are polyhedrons and thereby convex.

Note that if we use the bisection algorithm, this result also tells us that the feasibilityproblems can be stated as linear programs.

5 Structure and motion

In the next problem we will assume that neither the positions of the objects or the po-sitions and orientations of the cameras are known. However, we will still assume thatthe correspondence problem is solved, i.e. that it is known which measured bearingscorrespond to the same object. If the problem is is deduced from ordinary vision thiscorrespondence can be decided using features in the two-dimensional image. In caseof one-dimensional cameras the correspondence can be estimated with a RANSAC-typealgorithm.

Problem 5.1. Given n bearings from m different positions αI,J , I = 1, . . . , m, J =1, . . . , n the L∞ structure and motion problem is to find the solutionz = (P1, . . . , Pm, U1, . . . , Un) containing the the camera matrices P1, . . . , Pm and thereconstructed points U1, . . . , Un such that

f∞(z) = max(I,J)

|α(PI , UJ) − αI,J |, (9)

is minimal.

Unfortunately this problem does not have the same nice properties as the intersectionand resection problems of the previous section. The reason is that when both P and Uare unknown (8) are in general not convex conditions. Nonetheless, quasiconvexity willplay an important role for this problem as well.

The basic idea of our optimization scheme is to first consider optimization with fixedcamera orientations, and then use branch and bound over the space of possible orienta-tions. A problem here is that, especially with many cameras, the set of possible orienta-tions is large. A method to reduce this set, using linear conditions on the orientations ispresented in Section 5.3.

62

5. STRUCTURE AND MOTION

5.1 Optimization with fixed orientations

It is useful to divide the parameter space z = (P, U) into one part that correspond to theposition of the cameras and points zp and one part that correspond to the orientation ofthe cameras zθ = (θ1, . . . , θm).

Definition 5.1. We define a function d(zθ) as

d(zθ) = minzp

f∞(zθ, zp). (10)

Lemma 5.1. The problem of verifying if

d(zθ) = minzp

f∞(z) ≤ ∆ (11)

for ∆ < π/2 is a linear programming feasibility problem (and thus, the minimization overzp is a quasiconvex problem).

Proof. If the orientations are fixed we can without loss of generality assume that orienta-tions have been corrected for. Let

u =

[cos(α)sin(α)

]

be the measured angle represented as a normal vector u and let Up =[Ux Uy

]Tand

Pp =[Px Py

]T. Now

u × (Up − Pp)

u · (Up − Pp)=

|u||(Up − Pp)| sin(α − α)

|u||(Up − Pp)| cos(α − α)= tan(α − α).

The constraint that|α − α| ≤ ∆

is equivalent to|u× (Up − Pp)| ≤ tan(∆)(u · (Up − Pp)),

which constitutes two linear constraints in the unknowns.

This means that we can use linear programming to determine if the minimal maxnorm is less than some certain bound ∆ and using bisection, we can get a good estimationof the minimal max norm.

To get better convergence we modify the normal bisection algorithm slightly. Theidea is to seek a solution to the problem in Lemma 5.1, that lies in the interior of thefeasible space. For such a solution the max norm of the reprojection errors might besmaller than the current ∆, say ∆⋆. Then one knows that the minimal max norm, d(zθ)

63

PAPER II

must be smaller than this ∆⋆. To find such an interior solution we introduce a newvariable k and try to maximize k under the constraints

|u× (Up − Pp)| + k ≤ tan(∆)(u · (Up − Pp)).

We can now present an algorithm for finding the minimal max norm for fixed orien-tations zθ.

1. Check if there is a feasible solution with all reprojected errors less than π/2. Thiscorresponds to tan(∆) = ∞ in the equations above. This can be solved by asimpler linear programming feasibility test. Use only (u · (Up −Pp)) > 0. If thisis feasible continue, otherwise return dmin > π/2.

2. Let µl = 0 and µh = π/2 be lower and upper bounds on the minimal max errornorm.

3. Set µ = (µh + µl)/2. Test if d(zθ) ≤ µ. If this is feasible calculate µ⋆ = f∞(z⋆)for the feasible solution z⋆ and set µh = µ⋆. Otherwise set µl = µ.

4. Iterate step 3 until µh − µl is below a predefined threshold.

An example of how this function d(zθ) may look like is shown in Figure 4.

5.2 Branch and bound over orientations

To get further, we need an idea of how the minimal maximum norm d(zθ) depends onthe camera orientations in zθ. This is given by the following lemma.

Lemma 5.2. The function d(zθ) satisfies

|d(zθ) − d(zθ)| ≤ |zθ − zθ|∞, (12)

which implies that it is Lipschitz continuous.

Proof. Recalling (1), we note that

αI,J(z) = βI,J(zp) − θI . (13)

We let z∗p be the optimal camera and point positions corresponding to zθ, so that d(zθ) =f∞(zθ, z

∗

p). Similarly, we define z∗p . Then

f∞(zθ, z∗

p) − f∞(zθ, z∗

p) =

= max(I,J)

|αI,J(zθ, z∗

p) − αI,J | − max(I,J)

|αI,J(zθ, z∗

p) − αI,J | ≤

≤ max(I,J)

|αI,J(zθ, z∗

p) − αI,J(zθ, z∗

p)| ≤

64

5. STRUCTURE AND MOTION

≤ maxI

|θI − θI | = |zθ − zθ|∞.

Butf∞(zθ, z

∗

p) = minzp

f∞(zθ, zp) ≤ f∞(zθ, z∗

p)

sof∞(zθ, z

∗

p) − f∞(zθ, z∗

p) ≤ |zθ − zθ|∞.

After letting z and w switch places and repeating the argument, we can conclude

|d(zθ) − d(zθ)| = |f∞(zθ, z∗

p) − f∞(zθ, z∗

p)| ≤ |zθ − zθ|∞.

Using the fact that the function d(zθ) can be calculated and that it is never steeperthan one, we will show how to solve globally for structure and motion. For the three viewproblem, this is done as follows.

First a candidate zθ for the global minima is found with error dopt. Then a quad-treesearch is performed for the parameter space zθ ∈ [0, 2π]2. At each level a square withcenter zmid and width w is studied. The square cannot contain any points z with lowererror function if d(zmid) > dopt + w, because of Lemma 5.2. This can be tested with asingle linear programming feasibility test.

In fact it is sufficient to study the feasibility problem of the errors between the mea-sured angles and the reprojected angles less than dopt + w for view 2 and 3 and less thandopt for view 1.

Note that the algorithms work equally well for problems with missing data.

5.3 Linear conditions on zθ

A problem with the branch and bound approach presented in the previous section is thatwhen many cameras are used the set of possible orientations is large. To reduce this set,we consider the pairwise intersection of beams. This gives us linear constraints on thecamera orientations.

Consider two cameras and a point which is visible in both cameras. Let βj be thebearing of the beam from camera j to the point (in a global coordinate frame). For thetwo beams to intersect at the point we get a linear condition on the relative positions ofthe cameras (see Figure 3). We formulate this in the following lemma.

Lemma 5.3. The bearings β1 and β2 of one point in two different cameras gives a conditionon the bearing γ1,2 of the beam from camera 1 to camera 2.

γ1,2 ∈ [β2 − π, β1] if (β2 − β1) ∈ [0, π]

γ1,2 ∈ [β1, β2 − π] if (β2 − β1) ∈ [−π, 0]

65

PAPER II

f

6

f��

Figure 3: For the beams from the two cameras (circles) to intersect, the right camera hasto lie between the dashed lines.

For each triplet of two cameras and one point, this lemma gives us a condition of thetype

γj,k ∈ [αj,m + θj − π, αk,m + θm]

If we have N image points, all visible in all cameras, this gives us N linear constraints oneach angle γj,k. For the problem to have a solution with zero error the intersection ofthese intervals must be non-empty. This implicitly defines a condition on the camera ori-entations. Since all equations are linear, finding the set of possible camera orientations isa linear problem that can be solved analytically. For the orientations where intersection isnot possible, the same calculations give us the minimal error such that all beams intersectpairwise. Also note that since each inequality only involves two cameras, the complexityof the calculations increases only linearly with the number of cameras. In Figure 8 anillustration of the conditions on the camera orientations is given.

6 Experiments

Illustration of a typical three view problem

Study the problem of 3 views of 7 points with measured angles (in radians)

α =

3.1 −1.9 −0.1 −1.9 −1.3 1.7 −0.4−2.2 −1.3 −0.2 −1.3 −1 1.9 −0.42.6 −2.9 −1.4 −2.9 −2.6 1.4 −1.7

,

where rows denot different views I and columns denote different points J .Initial estimates of the minimal solution shows that dopt ≤ 0.05. The first step of the

quadtree (with a centre point at (π, π) and width 2π) is feasible for the bound 0.05 + π.In the next two steps of the algorithm there are four and 16 regions respectively. Noneof these can be outruled. At the next level 60 out of 64 squares of width π/4 can beoutruled.

66

6. EXPERIMENTS

We summarize the first 10 steps of the algorithm by describing (i) the number nsq offeasible squares there are left at each level and (ii) how much area A out of the total areaAtot = (2π)2 do these squares represent.

step nsq log(A/Atot)1 4 02 16 03 4 -1.20414 8 -1.50515 20 -1.70936 28 -2.16527 40 -2.61248 68 -2.9849 104 -3.4015

10 92 -4.0568

After 10 steps of the algorithm the optimal solution is bounded by 5.94 ≤ P2,θ ≤5.98 and 0.74 ≤ P3,θ ≤ 0.86.

In Figure 4 a plot of d(zθ) for this problem is shown. An illustration of the conver-gence of the optimization is given in Figure 5.

0

1

2

3

4

5

6

7

0

1

2

3

4

5

6

7

0

0.5

1

1.5

2

θ2θ

3

d(z θ)

Figure 4: In the figure is shown the function dz(θ) as a function of (θ2, θ3)-parameterspace while keeping θ1 fixed, for the data in the example with 3 views of seven points.Notice that the function is periodic so for this particular example there is only one localminimum.

67

PAPER II

θ 2 − r

adia

ns

θ3 − radians

0 1 2 3 4 5 6

0

1

2

3

4

5

6

Figure 5: The quadtree map of the goal function in the (θ2, θ3)-parameter space wherewhite corresponds to regions that are discarded early and darker areas correspond to re-gions that are kept longer in the quadtree branch and bound algorithm.

Performance on synthetic data

To illustrate the convergence of the optimization method, a number of synthetic exam-ples were examined. Figure 6 shows how the feasible area decreases with each step of thealgorithm and Figure 7 shows illustrations of typical, random, synthetic three-view ex-amples with varying number of points. In certain cases there may be several local optimaand even in underconstrained cases (meaning less equations than unknowns) one can lo-cate the global minimum to a small region of parameter space due to all positive depthconstraints.

0 2 4 6 8 100

5

10

15

20

25

30

steps

n sq

0 2 4 6 8 10−6

−5

−4

−3

−2

−1

0

steps

log(

A/A

tot)

Figure 6: To illustrate the convergence of the algorithm, 100 random examples with 3cameras and 20 points were constructed. The plots show the median and the first andninth decile of the data. In the left plot the number of squares left after each step is shownand the right plot shows how much of the total area these squares constitutes.

68

6. EXPERIMENTS

Another underdetermined problem is showed in Figure 8. It shows clearly how thesolution curve is cut off by the linear conditions of Section 5.3.

θ 2 − r

adia

ns

θ3 − radians

0 1 2 3 4 5 6

0

1

2

3

4

5

6

θ 2 − r

adia

ns

θ3 − radians

0 1 2 3 4 5 6

0

1

2

3

4

5

6

θ 2 − r

adia

ns

θ3 − radians

0 1 2 3 4 5 6

0

1

2

3

4

5

6

Figure 7: Illustrations of the evolution of the quadtree map. See caption of Figure 5for explanation. Top: 4 points, 3 cameras (underconstrained case). Middle: 5 points,3 cameras with two local optima. Bottom: 6 points, 3 cameras (overconstrained case).Note that even though the solution is underconstrained with 4 points, one can locate theglobal optimum in a small region of parameters space.

Hockey rink data with Cremona dual

It is possible to convert every structure and motion problem with m images of n pointsinto a dual problem of n−1 images of m+1 points. We illustrate this with a subset of thedata from a real set of measurements performed at a ice hockey rink. The set contains 70images of 14 points. Here we studied a subset of 37 images of 4 points. Its dual consists of3 images of 38 points. The global optimum to the L∞ structure and motion problem iscalculated for this set. The solution is shown in Figure 9. By forming the primal solutionfrom this solution we get the solution for the original problem of 37 views of 4 points,also shown in Figure 9.

69

PAPER II

θ3 − radians

θ 2 − r

adia

ns

0 1 2 3 4 5 6

0

1

2

3

4

5

6

Figure 8: Illustration of the linear conditions of Section 5.3. Each pair of dashed linesshows a condition on the orientations of the cameras. Note how the solution curve is cutoff by the linear conditions. (3 cameras and 4 points.)

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.50

0.5

1

1.5

2

2.5

3

3.5

Figure 9: The global optimum to the structure and motion problem for the dual problem(top) and the primal problem (bottom)

Hockey rink data

By combining optimal structure and motion with optimal resection and intersection it ispossible to solve for many cameras and views. We illustrate this with the data from a realset of measurements performed at a ice hockey rink in 1991. The set contains 70 imagesof 14 points. The result is shown in Figure 10.

70

7. CONCLUSIONS

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−0.5

0

0.5

1

1.5

2

Figure 10: Calculated structure and motion for the icehockey experiment.

7 Conclusions

In this paper we have studied the problem of finding global minima to the structureand motion problem (SLAM, surveying) for one-dimensional retina cameras using themax-norm on reprojected angular errors. We have shown how the problem for knownorientation can be reduced to a series of linear programming feasibility tests. We have alsoshown that the objective function as a function of orientation variables has slope less thanone. This important observation gives us a way to search orientation space for the optimalsolution, resulting in a globally optimal algorithm with good empirical performance.

Acknowledgments

This work has been supported by, the European Commission’s Sixth Framework Pro-gramme under grant no. 011838 as part of the Integrated Project SMErobotTM , SwedishFoundation for Strategic Research (SSF) throught the programme Vision in CognitiveSystems II (VISCOS II), Swedish Research Council through grants no. 2004-4579’Image-Based Localisation and Recognition of Scenes’ and no. 2005-3230 ’Geometry ofmulti-camera systems’. Richard Hartley is with ANU and NICTA. NICTA is a researchcentre funded by the Australian Government’s Department of Communications, Infor-mation Technology and the Arts and the Australian Research Council, through BackingAustralia’s Ability and the ICT Research Centre of Excellence programs

71

PAPER II

72

Bibliography

[1] M. Armstrong, A. Zisserman, and R. Hartley. Self-calibration from image triplets.In Proc. 4th European Conf. on Computer Vision, Cambridge, UK, pages 3–16.Springer-Verlag, 1996.

[2] K. Åström, A. Heyden, F. Kahl, and M. Oskarsson. Structure and motion fromlines under affine projections. In Proc. 7th Int. Conf. on Computer Vision, Kerkyra,Greece, pages 285–292, 1999.

[3] K. Åström and F. Kahl. Ambiguous configurations for the 1d structure and motionproblem. Journal of Mathematical Imaging and Vision, 18(2):191–203, 2003.

[4] K. Åström and M. Oskarsson. Solutions and ambiguities of the structure and mo-tion problem for 1d retinal vision. Journal of Mathematical Imaging and Vision,12(2):121–135, 2000.

[5] O. D. Faugeras, L. Quan, and P. Sturm. Self-calibration of a 1d projective cameraand its application to the self-calibration of a 2d projective camera. In Proc. 5th Eu-ropean Conf. on Computer Vision, Freiburg, Germany, pages 36–52. Springer-Verlag,1998.

[6] R. Hartley and F. Schaffalitzky. L∞ minimization in geometric reconstruction prob-lems. In Proc. Conf. Computer Vision and Pattern Recognition, Washington DC, pages504–509, Washington DC, USA, 2004.

[7] K. Hyyppä. Optical navigation system using passive identical beacons. In Louis O.Hertzberger and Frans C. A. Groen, editors, Intelligent Autonomous Systems, An In-ternational Conference, Amsterdam, The Netherlands, 8-11 December 1986, pages737–741. North-Holland, 1987.

[8] Fredrik Kahl. Multiple view geometry and the L∞-norm. In International Confer-ence on Computer Vision, pages 1002–1009, Beijing, China, 2005.

[9] Q. Ke and T. Kanade. Quasiconvex optimization for robust geometric reconstruc-tion. In International Conference on Computer Vision, pages 986 – 993, Beijing,China, 2005.

[10] H. Li. A practical algorithm for l∞ triangulation with outliers. In Proc. Conf.Computer Vision and Pattern Recognition, Minneapolis, USA, 2007.

73

[11] M. Oskarsson and K. Åström. Automatic geometric reasoning in structure andmotion estimation. Pattern Recognition Letters, 21(13-14):1105–1113, 2000.

[12] Magnus Oskarsson, Kalle Åström, and Niels Chr. Overgaard. The minimal struc-ture and motion problems with missing data for 1d retina vision. Journal of Mathe-matical Imaging and Vision, 26(3):327–343, 2006.

[13] L. Quan. Uncalibrated 1d projective camera and 3D affine reconstruction of lines.In Proc. CVPR, pages 60 – 65, 1997.

[14] L. Quan and T. Kanade. Affine structure from line correspondences with un-calibrated affine cameras. IEEE Trans. Pattern Analysis and Machine Intelligence,19(8):834–845, August 1997.

[15] K. Sim and R. Hartley. Removing outliers using the L∞-norm. In Proc. Conf.Computer Vision and Pattern Recognition, pages 485–492, New York City, USA,2006.

74

PAPER III


75

Efficient Optimization forL∞-problems using Pseudoconvexity

Carl Olsson, Anders P. Eriksson and Fredrik Kahl

Abstract

In this paper we consider the problem of solving geometric reconstruction problems withthe L∞-norm. Previous work has shown that globally optimal solutions can be computedreliably for a series of such problems. The methods for computing the solutions have re-lied on the property of quasiconvexity. For quasiconvex problems, checking if there existsa solution below a certain objective value can be posed as a convex feasibility problem.To solve the L∞-problem one typically employs a bisection algorithm, generating a se-quence of convex problems. In this paper we present more efficient ways of computingthe solutions.

We derive necessary and sufficient conditions for a global optimum. A key propertyis that of pseudoconvexity, which is a stronger condition than quasiconvexity. The resultsopen up the possibility of using local optimization methods for more efficient computa-tions. We present two such algorithms. The first one is an interior point method that usesthe KKT conditions and the second one is similar to the bisection method in the senseit solves a sequence of SOCP problems. Results are presented and compared to the stan-dard bisection algorithm on real data for various problems and scenarios with improvedperformance.

1 Introduction

Many geometrical computer vision problems may be formulated as optimization prob-lems. For example, solving a multiview triangulation problem can be done by minimizingthe L2-reprojection error. In general this is a hard non-convex problem if the numberof views is more than two [5]. Since closed form solutions are only avaliable in spe-cial cases, optimization problems are often solved using iterative methods such as bundleadjustment. The success of this type of local methods rely on good initialization meth-ods. However, the initialization techniques frequently used optimize some algebraic cost

77

PAPER III

Geometric L∞-problem References

− Multiview triangulation [4, 6, 7, 3]− Camera resectioning [6, 7]− Homography estimation [6, 7]− Structure and motion recovery withknown camera orientation

[4, 6, 7]

− Reconstruction by using a reference plane [6]− Camera motion recovery [10]− Outlier detection [11, 9]− Reconstruction with covariance-based un-certainty

[10, 8]

Table 1: List of different geometric reconstruction problems that can be solved globally withthe L∞-norm.

function which, on one hand, simplifies the problem, but, on the other hand, has nogeometrical or statistical meaning. When significant measurement noise is present suchestimates may be far from the global optimum.

To remedy this problem, the use of L∞-optimization was introduced in [4]. It wasshown that many geometric vision problems have a single local optimum if the L2-normis replaced by the L∞-norm. This work has been extended in several directions andthere is now a large class of problems that can be solved globally using L∞-optimization,see Table 1. They have all been shown to have a single local optimum when using theL∞-norm.

In [6, 7] it was shown that these problems are examples of quasiconvex optimizationproblems. A bisection-algorithm based on second order cone programs (SOCP) for solv-ing this type of problems was also introduced. Let fi(x) be quasiconvex functions. Thealgorithm works by checking if there is an x satisfying fi(x) ≤ µ for all i for a fixed µ.A bisection is then performed on the parameter µ. Thus to solve the original problem weare led to solve a sequence of SOCPs. A particular problem when running the bisectionalgorithm is that it is not possible to specify a starting point for the SOCP. Even thougha good solution might be available from a previously solved SOCP, this solution will ingeneral not lie on the so called central-path. This is however required for for good con-vergence of the interior-point-method used to solve the SOCP. Hence much would begained if it was possible to let µ vary, and not have to restart the optimization each timeµ is changed.

In [4] the µ was allowed to vary during the optimization, hence the problem wassolved using a single program. Although this program is not convex, it was observedthat this worked well for moderate scale problems. Still convergence to the global min-imum was not proven. In [3], an alternative optimization technique based on intervalanalysis was proposed for solving the multiview triangulation problem. However, the op-

78

2. THEORETICAL BACKGROUND

timization technique does not exploit the convexity properties of the problem and it maybe inefficient. For the multiview triangulation problems reported in [3], the executiontimes are in the order of several seconds which is considerably slower than the bisectionalgorithm [6, 7].

In this paper we show that we are not limited to keeping µ fixed. We show that thefunctions involved in L∞-problems are not just quasiconvex but actually pseudoconvexwhich is a stronger condition. This allows us to derive necessary and sufficient conditionsfor a global optimum, which opens up the possibility of using local optimization algo-rithms as the ones used in [4]. We show that these algorithms are more efficient than thebisection algorithm in terms of execution times. For large scale algorithms we propose analgorithm that is similar to the bisection algorithm in that it solves a sequence of SOCPs.However rather than fixing µ we will approximate the original program using a SOCP.

2 Theoretical background

In this section we formulate the L∞-problem and briefly review some concepts fromoptimization which will be needed in Sections 3 and 4.

2.1 Problem formulation

The problems in geometric computer vision that we will be considering in this paper maybe written in the following minimax form:

minx

maxi

|| [ aT

i1x+b1,aT

i2x+b2 ] ||2

aT

i3x+b3

(1)

s.t. aTi3x + b3 > 0, i = 1, . . . , m (2)

where aij ∈ Rn and bj ∈ R for j = 1, 2, 3. The dimension of the problem depends

on the particular application, starting from n = 3 for the (basic) multiview triangulationproblem.

If we consider the individual (residual) functions|| [ aT

i1x+b1,aT

i2x+b2 ] ||2

aT

i3x+b3

, i = 1, . . . , m,

as the components of an m-vector, then the problem may be thought of as minimizingthe L∞-norm of this (residual) vector.

2.2 Various types of convexity

Next we recapitulate on some of the different types of convexity and their properties.This is well-known in the optimization literature, cf. [1]. When we discuss propertiesof different function classes we need to distinguish between the following three types ofpoints.

Definition 2.1. x is a stationary point if ∇f(x) = 0.

79

PAPER III

Definition 2.2. x is a local minimum if there exists ǫ > 0 such that f(x) ≥ f(x) for allx with ||x − x|| ≤ ǫ.

Definition 2.3. x is a strict local minimum if there exists ǫ > 0 such that f(x) > f(x)for all x with ||x − x|| ≤ ǫ.

For a differentiable function we always have that strict local minimum ⇒ local mini-mum ⇒ ∇f(x) = 0. The reversed implications are, however, not true in general.

Definition 2.4. A function f is called quasiconvex if its sublevel sets Sµ(f) = { x; f(x) ≤µ } are convex.

In [6, 7] it was shown that the objective functions in a variety of structure and motionproblems are quasiconvex. Quasiconvex functions have the property that any strict localminimum is also a global minimum. They may however have several local minima andstationary points. Thus a local decent algorithm may not converge to the desired globaloptimum. Instead it is natural to use the property of convex sublevel sets as a basis for abisection algorithm. For a fixed µ, finding a solution x such that f(x) ≤ µ can be turnedinto a convex feasibility problem. See [6, 7] for further details.

In [11] the notion of strict quasiconvexity was introduced.

Definition 2.5. A function f is called strictly quasiconvex if f is continuous quasiconvexand its sublevel sets Sµ(f) = { x; f(x) ≤ µ } fulfills the additional property

⋃

µ<µ

Sµ(f) = int Sµ(f). (3)

Strictly quasiconvex functions have the property that any local minimum is a globalminimum. They may however still have additional stationary points. As an exampleconsider the function f(x) = x3 on the set −1 ≤ x ≤ 1 (see Figure 1). The gradientvanishes at x = 0 but the global optimum is clearly in x = −1.

Note that there is also a class of functions that are referred to as strongly quasicon-vex in the literature. To further complicate things the notions of strongly and strictlyquasiconvex are sometimes interchanged.

One of the goals is to show that for all of the problems considered in Table 1, weare able to use local search algorithms rather than the bisection algorithm. In these typesof algorithms, improving or descent directions are often determined using gradients andpossibly Hessians. Therefore it is crucial that the gradients do not vanish anywhere exceptin the global optimum, that is, there can be no stationary points except for the globalminimum. As we have seen this is not necessary true for quasiconvex (or even strictlyquasiconvex) functions. Therefore we are led to study the following class of functions.

Definition 2.6. f is called pseudoconvex if f is differentiable and whenever ∇f(x)(x−x) ≥ 0 we also have that f(x) ≥ f(x).

80

2. THEORETICAL BACKGROUND

−1 −0.5 0 0.5 1−1

−0.5

0

0.5

1

Figure 1: The function f(x) = x3,−1 ≤ x ≤ 1, is strictly quasiconvex but still has astationary point in x = 0.

A pseudoconvex function is always quasiconvex (see [1]), but not necessarily the otherway around. Pseudoconvex functions also have the following nice property:

Lemma 2.1. Suppose f is pseudoconvex, then ∇f(x) = 0 if and only if f(x) ≥ f(x) forall x.

Proof. If ∇f(x) = 0 then ∇f(x)(x − x) = 0 for all x and by definition f(x) ≥ f(x).If f(x) ≥ f(x) for all x then x is a global (and hence local) minimum and therefore∇f(x) = 0. ⊓⊔

Thus for a pseudoconvex function any stationary point is a global minimum. Thisis a useful property since it ensures that the gradient does not vanish anywhere except inthe optimum, making it possible to solve using, for instance, a steepest decent algorithm.For further details on various convexity issues, see e.g. [1, 2].

2.3 Constrained optimization

When minimizing an unconstrained differentiable function f , one is interested in solvingthe equations ∇f(x) = 0. In the constrained case the corresponding equations are theKKT conditions (see [2, 1]). The KKT conditions play an important role in optimization.Many algorithms, such as interior point methods or sequential quadratic programming,are conceived as methods for solving the KKT conditions. Consider the constrainedoptimization problem

min f(x) (4)

s.t. fi(x) ≤ 0, i = 1, ..., m. (5)

81

PAPER III

The KKT conditions for this problem are

∇f(x) +m

∑

j=1

λj∇fj(x) = 0 (6)

λifi(x) = 0 (7)

fi(x) ≤ 0 (8)

λi ≥ 0, i = 1, . . . , m. (9)

In the general nonconvex case there may be many solutions to these equations. Howeverin Section 3 we will show that only the global optimum solves them for the class ofproblems in Table 1.

3 Theoretical results

In this section we will derive our main results. We will first show that the objective func-tion considered in (1) is in fact pseudoconvex. Then we proceed to derive necessary andsufficient conditions for global optima of a function that is a maximum of pseudoconvexfunctions.

Note that for a minimax problem of the form (1), one may equivalently consider thesquared residual functions. Therefore, let

f(x) =(aT

1 x + b1)2 + (aT

2 x + b2)2

(aT3 x + b3)2

, (10)

where aj ∈ Rn and bj ∈ R. Here f(x) can be written as a quotient w(x)

v(x) between the

convex function

w(x) =(aT

1 x + b1)2 + (aT

2 x + b2)2

aT3 x + b3

(11)

and the linear (and hence concave) function

v(x) = aT3 x + b3. (12)

on the set { x | v(x) > 0 }. In the next lemma we show that any such function ispseudoconvex on this domain.

Lemma 3.1. If w : S 7→ R is convex and v : S 7→ R is concave then f(x) = w(x)v(x) is

pseudoconvex on S = { x | v(x) > 0}.

Proof. Since w is convex and w concave we have

w(x) − w(x) ≥ ∇w(x)(x − x) (13)

v(x) − v(x) ≤ ∇v(x)(x − x). (14)

82

3. THEORETICAL RESULTS

The gradient of f is given by

∇f(x) =1

v(x)

(

∇w(x) −w(x)

v(x)∇v(x)

)

. (15)

Setting

∇f(x)(x − x) ≥ 0 (16)

and since v(x) > 0 we have(

∇w(x) −w(x)

v(x)∇v(x)

)

(x − x) ≥ 0. (17)

Inserting (13) and (14) yields

0 ≤ w(x) − w(x) −w(x)

v(x)(v(x) − v(x)) (18)

⇔w(x)

v(x)≤

w(x)

v(x)⇔ f(x) ≤ f(x). (19)

⊓⊔

Thus, from Definition 2.6, f(x) is pseudoconvex on the set { x | aT3 x + b3 > 0 }.

Now recall that we wish to minimize f(x) = maxi fi(x), where each fi is pseudoconvex.It does not make sense to say that pseudoconvexity is preserved under the max-operation,since the resulting function is in general not differentiable everywhere. However we areable to use pseudoconvexity to derive optimality conditions for the max-function.

Theorem 3.2. x∗ solves µ∗ = infx∈S f(x), where S = { x; vi(x) > 0 ∀i }, if and onlyif there exists λ∗

i such thatm

∑

j=1

λ∗j∇fj(x

∗) = 0 (20)

where λ∗i ≥ 0 if fi(x

∗) = µ∗ and λ∗i = 0 if fi(x

∗) < µ∗ for i = 1, . . . , m and∑

j λ∗j = 1.

Proof. If fi(x∗) < µ∗ then there is a neighborhood such that fi(x) < µ∗ since fi is

continuous. Hence we may disregard the functions where fi(x∗) < µ∗ and assume that

all fi(x∗) = µ∗.

First we show that if x∗ is a local minimizer then (20) is fulfilled. If x∗ is a localminimizer, then for all directions d there is an i such that∇fi(x

∗)T d ≥ 0, or equivalentlythe system ∇fi(x

∗)T d < 0 for all i has no solution. Let A be the matrix with rows∇fi(x

∗)T . Then the system ∇fi(x∗)T d < 0 for all i can be written

Ad < 0 (21)

83

PAPER III

and the system (20) can be written

AT λ∗ = 0, λ∗ ≥ 0,∑

j

λ∗j = 1. (22)

By Gordan’s theorem (which is a Farkas type theorem, see [1]) precisely one of thesesystems has a solution, and therefore (20) has a solution.

Next assume that there exists an x such that f(x∗) > f(x). We will show that thesystem

∑

j

λ∗j∇fj(x

∗) = 0

∑

j

λ∗j = 1

λ∗i ≥ 0 for i = 1, . . . , m (23)

has no solution. Since f is quasiconvex (fi are pseudoconvex and thereby quasiconvex)the direction d = x− x∗ is a decent direction. Therefore ∇fi(x

∗)T d ≤ 0 for all i. Nowassume that ∇fi(x

∗)T d = 0 for some i. Then we have

f(x) ≥ fi(x) ≥ fi(x∗) = µ∗, (24)

since fi is pseudoconvex, which contradicts f(x∗) > f(x). Therefore we must have that∇fi(x

∗)T d < 0 for all i. Now since all of the λ∗i are nonnegative and sum to one, we

havedT

∑

i

λi∇fi(x∗) < 0 (25)

and therefore the system (23) has no solution. ⊓⊔

Note that pseudoconvexity is used in the second part of the theorem. In fact, forgeneral functions these conditions are necessary but not sufficient for a global minimum,for the sufficiency part we require pseudoconvexity.

The interpretation of the optimality conditions is that if none of the gradients vanish,then in each direction d there is an i such that ∇fi(x)T d ≥ 0, that is in each directionat least one of the functions fi does not decrease. This theorem shows that a steepestdescent algorithm that follows the gradient (or subgradient where the function is notdifferentiable) would find the global minimum, since the gradients does not vanish any-where except for the optimum. Such a method only uses first order derivatives, we wouldhowever like to employ higher order methods like interior point methods since these arein general more effective. Therefore we rewrite the problem as follows:

P1 : min µ

s.t. fi(x) − µ2 ≤ 0

x ∈ S, µ ≥ 0, (26)

84

3. THEORETICAL RESULTS

This gives us a constrained problem where all functions are twice differentiable. TheKKT conditions for this problem are

(

01

)

+∑

j

λj

(

∇fj(x∗)

−2µ∗

)

= 0 (27)

fi(x∗) − (µ∗)2 ≤ 0 (28)

λi ≥ 0 (29)

λi(fi(x∗) − (µ∗)2) = 0. (30)

Corollary 3.3. The KKT conditions (27)-(30) are both necessary and sufficient for a globaloptimum in problem (26).

Proof. The KKT conditions are always necessary and sufficient for stationary points. Bycondition (30) we know that λi is zero if fi(x

∗) < µ∗. By Theorem 3.2 the system(23) has a solution if and only if x∗ is a global optimum of minx∈S f(x). We see thatif µ∗ > 0 then (27) and (29) have a solution if and only if (23) has a solution. Sinceminx∈S f(x) is equivalent to problem (26), it follows that (x∗, µ∗) is a global optimumfor this problem. If µ = 0 then the result is trivial since then ∇fi(x) = 0 for all i. ⊓⊔

To avoid working with functions containing quotients we rewrite our problem again.Let

gi(x) = (aTi1x + bi1)

2 + (aTi2x + bi1)

2 (31)

hi(x) = (aTi3x + bi3)

2 (32)

hi(x, µ) = µ2hi(x). (33)

An equivalent problem is

P2 : min µ

s.t. gi(x) − hi(x, µ) ≤ 0

x ∈ S, µ ≥ 0. (34)

Note that for a fixed µ this is the SOCP used in the bisection algorithm where we havesquared the cone conditions in order to be able to take derivatives. The KKT conditionsfor this problem are

(

01

)

+∑

j

γj

(

∇gj(x∗) −∇xhj(x

∗, µ∗)−2µ∗hj(x

∗)

)

= 0 (35)

gi(x∗) − hi(x

∗, µ∗) ≤ 0 (36)

γi ≥ 0 (37)

γi(gi(x∗) − hi(x

∗, µ∗)) = 0. (38)

85

PAPER III

Corollary 3.4. The KKT conditions (35)-(38) are both necessary and sufficient for a globaloptimum in problem (34).

Proof. We have

∇fi(x) =1

hi(x)

(

∇gi(x) −gi(x)

hi(x)∇hi(x)

)

. (39)

Let γi = λi

hi(x) . We see that since hi(x) is positive for x ∈ S, it follows that (36)

- (38) is equivalent to (28) - (30). For equation (35) we see that if γi is nonzero then

µ2 = gi(x)hi(x) and therefore the two KKT systems are equivalent. ⊓⊔

4 Algorithms

We present two algorithms for solving our L∞-problems. They are both based oninterior-point methods which solve the KKT conditions of the system. The first oneis a standard solver for nonconvex problems. Using this solver we may formulate theprogram such that µ is allowed two vary and thus solves the problem with one program.This is in contrast with the bisection method which solves a sequence of programs forfixed µ. A particular problem with the bisection algorithm is it is not possible to specifya starting point when using the existing interior-point-methods. This is because it maynot lie on the so called central-path, which is resulting in slow convergence. Hence muchcould be gained by, letting µ vary, and not have toe restart the optimization procedureeach time mu is changed.

In the second algorithm we use SeDuMi to solve a sequence of SOCP that approxi-mates the original problem.

4.1 A Primal dual interior point algorithm for the pseudoconvex sys-tem.

In this section we briefly review the LOQO-algorithm [13] which is a state-of-the-art op-timization algorithm that we will use for solving moderate scale geometric reconstructionproblems. In fact, in [4] this algorithm was also tested. It was observed to work well,however convergence was not proved. Since quasiconvexity is not enough to prove con-vergence, to our knowledge this has not been pursued further. LOQO is an interior pointalgorithm for general nonconvex problems. As most interior point methods it searchesfor a solution to the KKT-conditions. For a general nonconvex problem the solution isnot necessary the global minima since there may be more than one KKT-point, however

86

4. ALGORITHMS

in our case we know that there is only one. Recall that our problem is

P2 : min µ

s.t. gi(x) − hi(x, µ) ≤ 0

x ∈ S, µ ≥ 0. (40)

LOQO starts by adding slack variables wi ≥ 0 such that the inequality constraints arereplaced by the equality constraints gi(x) − hi(x, µ) − w = 0. The constraints w ≥ 0are eliminated by adding a logarithmic penalty term to the objective function.

PLOQO : min µ + ν∑m

j=1 log(wj)

s.t. gi(x) − hi(x, µ) − wi = 0. (41)

The first order conditions for this problem are(

01

)

+∑

j

γj

(

∇gj(x∗) −∇xhj(x

∗, µ∗)−2µ∗hj(x

∗)

)

= 0 (42)

ν + wiγi = 0 (43)

gi(x) − hi(x, µ) − wi = 0. (44)

LOQO uses Newton’s method to solve these equations. It can be shown that as ν tendsto zero this gives a solution to the KKT conditions for our problem. And by the theoryin section 3 we know that this gives us the global optimum.

4.2 Solving the KKT condition via a sequence of SOCP

Although the algorithm presented in the previous section is much more efficient than thebisection algorithm of [6] for moderate scale problems (see Section 5) we have found thatfor large problems it converges very slowly. Therefore we also present an algorithm thatsolves the KKT conditions via a sequence of SOCPs.

It resembles the bisection algorithm, in that it solves a sequence of SOCPs. Thedifference is that instead of fixing µ to get a cone program we will make a simple ap-proximation of the condition gi(x) − hi(x, µ) ≤ 0 with a cone condition such that theKKT-conditions of the resulting program approximates the KKT-conditions of the orig-inal program. Recall that in the bisection algorithm we solve feasibility problems of thetype

Pµ : find x

s.t. gi(x) − hi(x, µ) ≤ 0

x ∈ S, µ ≥ 0 (45)

for a sequence of fixed µ = {µl}. Here µ is fixed since we want hi(x, µ) to be an affinefunction squared. Recall that hi(x, µ) = (µ(aT

i3x + bi3))2. However instead of fixing µ

87

PAPER III

we may choose to approximate µ(aTi3x + bi3) with its 1st order Taylor expansion around

a point (xl, µl). The Taylor expansion can be written

µ(aTi3x + bi3) ≈ µl(a

Ti3x + bi3) + ∆µ(aT

i3xl + bi3), (46)

where ∆µ = µ − µl. Note that if the second term is disregarded we get Pµ. Let

hil(x, µ) = (µl(aTi3x + bi3) + ∆µ(aT

i3xl + bi3))2 (47)

and consider the program

Pµ : min µ

s.t. gi(x) − hil(x, µ) ≤ 0

x ∈ S, µ ≥ 0. (48)

The first order conditions of this program are

(

01

)

+∑

j

λj

(

∇gj(x∗) −∇xhjl(x

∗, µ∗)−∂µhjl(x

∗, µ∗)

)

= 0 (49)

gi(x∗) − hil(x

∗, µ∗) ≤ 0 (50)

λi ≥ 0 (51)

λi(gi(x∗) − hil(x

∗, µ∗)) = 0. (52)

It is reasonable to assume that this program approximates problem P2 well in a neighbor-hood around (xl, µl). Therefore we define the sequence {xl, µl} as follows. For a given(xl, µl) we let xl+1 be the solution of the program (48). To ensure that (xl+1, µl+1) isfeasible in P2 we put µl+1 = maxifi(xl+1). Note that {µl} is a descending sequencewith µl ≥ 0 for all l.

We will see that if µl+1 = µl then xl+1, µl also solves problem (34). We have that

hil(x, µl) = hi(x, µl) (53)

∇xhil(x, µl) = ∇xhi(x, µl). (54)

Since both xl and xl+1 are feasible we have ∂µhil(xl+1, µl) > 0. By rescaling the dualvariables it is now easy to see that since the system (49)-(52) is solvable with (xl+1, µl)then so is (35)-(38).

5 Experimental results

In this section we compare the proposed algorithms with the state of the art, which is thebisection algorithm. For the moderate scale problems we tested the algorithms on ran-domly generated instances of the triangulation, resection and homography problems of

88

5. EXPERIMENTAL RESULTS

different sizes. The reported execution times are the total time spent in the optimizationroutines, that is, we do not include the time spent setting up the problem.

All the experiments have been carried out on a standard PC P4 3.0 GHz machine.For solving SOCP problems, we use the publicly available SeDuMi [12]. Both SeDuMiand LOQO are considered to be state-of-the-art optimization software and they are bothoptimized for efficiency. Still, we are aware of that the reported results dependent on theactual implementations, but it gives an indication of which approach is most efficient.Another measure of time complexity is the number of Newton iterations each methodhas to solve. Even though the Newton iterations are not equivalent for the differentapproaches, it gives, again, an indication of which scheme is preferred. This measure ismostly relevant for large-scale problems.

To achieve approximately the same accuracy for the different algorithms we chosethe following termination criteria. For the bisection algorithm we used the differencebetween the upper and lower bounds. When the difference is less than 10−4 the algorithmterminates. LOQO uses a threshold on the duality gap as termination criteria. Thethreshold was set to 10−4. For the SOCP algorithm we used ∆µ ≤ 10−4 as terminationcriteria. For each size we measured the average execution time for solving 100 instancesof the problem. The results are shown in Table 2.

bisection SOCP-approx. LOQO

Triangulation:5 cameras 1.23 .195 .00281

10 cameras 1.38 .207 .0035820 cameras 1.29 .223 .0064530 cameras 1.36 .234 .00969

Homography:10 points 1.05 .363 .0081620 points 1.17 .373 .012830 points 1.22 .377 .0193

Resectioning:10 points .823 .327 .012820 points .994 .345 .028730 points 1.04 .349 .0418

Table 2: Average execution times (s) for 100 random instances of each problem.

For the large scale test we used the known rotation problem. Here we assume that theorientations of the cameras are known. The objective is to determine the 3D structure(in terms of 3D points) and the positions of the cameras. We have tested the SOCP-algorithm on two sequences. The first one is a sequence of 15 images of a flower anda chair (see Figure 3), and the second one is the well known dinosaur sequence (see

89

PAPER III

Figure 4). For large scale problems, we have noticed that the LOQO algorithm sometimesstops prematurely, without converging to the global optimum. We believe that this is dueto bad numerical conditioning. Therefore no experiments are reported with LOQO forthese test scenarios.

The obtained reconstructions are shown in Figures 3and 4. Figure 2 shows the con-vergence of the SOCP-algorithm. For compairison we have also plotted the upper (dashedline) and lower bound (dotted line) of the bisection algorithm at each iteration. For bothsequences the SOCP algorithm converges after 4 iterations. In contrast the bisection al-gorithm takes 15 iterations to achieve an accuracy of 10−4. However, comparing thenumber of SOCPs solved is not completely fair since each SOCP solved by the bisectionalgorithm is slighty simpler. Therefore we also calculated the total number of Newtonsteps taken during the optimization. Table 3 shows the measured execution times and thetotal number of Newton steps.

bisection SOCP-approx.Flower sequence:

Execution times (s) 47.4 16.6Newton iterations 261 87

Dinosaur sequence:Execution times (s) 34.0 15.9Newton iterations 215 70

Table 3: Measured execution times (s) and total number of Newton iterations for com-puting the structure and motion of the flower and dinosaur sequences.

0 5 10 150

2

4

6

8

10

Iteration

Obj

ectiv

e va

lue

Flower problem

bisection upper boundbisection lower boundSOCP−approx

0 5 10 150

2

4

6

8

10

Iteration

Obj

ectiv

e va

lue

Dinosaur problem

bisection upper boundbisection lower boundSOCP−approx

Figure 2: Convergence of the bisection and the SOCP algorithms for the flower sequence(top) and the dinosaur sequence (bottom).

90

5. EXPERIMENTAL RESULTS

−0.5

1.5 −2

1

−1

1

Figure 3: Three images from the flower sequence, and the reconstructed structure andmotion.

−1

1

−1

1−0.2

0.5

Figure 4: Three images form the dino sequence, and the reconstructed structure andmotion.

91

PAPER III

6 Conclusions

We have analyzed a class of L∞-problems for which reliable global solutions can be com-puted. We have shown several important convexity properties for these problems - includ-ing pseudoconvexity and necessary/sufficient conditions for a global minimum. Based onthese results, it should be possible to design more efficient algorithms for multiview ge-ometry problems. We have presented two algorithms for efficient optimization for thisclass of problems. Comparison to the state-of-the-art bisection algorithm shows consid-erable improvements both in terms of execution times and the total number of Newtoniterations required.

Acknowledgments

This work has been supported by the European Commission’s Sixth Framework Pro-gramme under grant no. 011838 as part of the Integrated Project SMErobotTM , SwedishFoundation for Strategic Research (SSF) through the programme Vision in Cognitive Sys-tems II (VISCOS II), Swedish Research Council through grants no. 2004-4579 ’Image-Based Localisation and Recognition of Scenes’ and no. 2005-3230 ’Geometry of multi-camera systems’.

92

Bibliography

[1] Shetty. Bazaraa, Sherali. Nonlinear Programming, Theory and Algorithms. Wiley,2006.


[3] M. Farenzena, A. Fusiello, and A. Dovier. Reconstruction with interval constraintspropagation. In Proc. Conf. Computer Vision and Pattern Recognition, pages 1185–1190. New York City, USA, 2006.

[4] R. Hartley and F. Schaffalitzky. L∞ minimization in geometric reconstruction prob-lems. In Proc. Conf. Computer Vision and Pattern Recognition, pages 504–509. 2004.


[6] Fredrik Kahl. Multiple view geometry and the L∞-norm. In International Confer-ence on Computer Vision, pages 1002–1009. Beijing, China, 2005.

[7] Q. Ke and T. Kanade. Quasiconvex optimization for robust geometric reconstruc-tion. In International Conference on Computer Vision, pages 986 – 993. Beijing,China, 2005.

[8] Q. Ke and T. Kanade. Uncertainty models in quasiconvex optimization for geomet-ric reconstruction. In Proc. Conf. Computer Vision and Pattern Recognition, pages1199 – 1205. New York City, USA, 2006.

[9] H Li. A practical algorithm for l∞ triangulation with outliers. In Proc. Conf.Computer Vision and Pattern Recognition. Minneapolis, USA, 2007.

[10] K. Sim and R. Hartley. Recovering camera motion using the L∞-norm. In Proc.Conf. Computer Vision and Pattern Recognition, pages 1230–1237. New York City,USA, 2006.

[11] K. Sim and R. Hartley. Removing outliers using the L∞-norm. In Proc. Conf.Computer Vision and Pattern Recognition, pages 485–492. New York City, USA,2006.

93

[12] Jos F. Sturm. Using sedumi 1.02, a matlab toolbox for optimization over symmetriccones. 1998.

[13] R. J. Vanderbei and D. F. Shanno. An interior point algorithm for nonconvexnonlinear programming. Computational Optimization and Applications, 13:231–252, 1999.

94

PAPER IV

Submitted to Computer Vision and Image Understanding, 2007.

95

Improved Spectral Relaxation Methodsfor Binary Quadratic Optimization

Problems

Carl Olsson, Anders P. Eriksson and Fredrik Kahl

Abstract

In this paper we introduce two new methods for solving binary quadratic problems.While spectral relaxation methods have been the workhorse subroutine for a wide varietyof computer vision problems - segmentation, clustering, subgraph matching to name afew - it has recently been challenged by semidefinite programming (SDP) relaxations. Infact, it can be shown that SDP relaxations produce better lower bounds than spectral re-laxations on binary problems with a quadratic objective function. On the other hand, thecomputational complexity for SDP increases rapidly as the number of decision variablesgrows making them inapplicable to large scale problems.

Our methods combine the merits of both spectral and SDP relaxations - better (lower)bounds than traditional spectral methods and considerably faster execution times thanSDP. The first method is based on spectral subgradients and can be applied to large scaleSDPs with binary decision variables and the second one is based on the trust regionproblem. Both algorithms have been applied to several large scale vision problems withgood performance.

1 Introduction

Spectral relaxation methods can be applied to a wide variety of problems in computervision. They have been developed to provide solutions to, e.g., motion segmentation,figure-ground segmentation, clustering, subgraph matching and digital matting [9, 14,21, 26, 12]. In particular, large scale problems that can be formulated with a binaryquadratic objective function are handled efficiently with several thousands of decisionvariables.

More recently, semidefinite programming (SDP) relaxations have also been appliedto the same type of computer vision problems, e.g., [10, 25, 20]. It can be shown that

97

PAPER IV

such relaxations produce better estimates than spectral methods. However, as the numberof variables grows, the execution times of the semidefinite programs increase rapidly. Inpractice, one is limited to a few hundred decision variables.

Spectral and SDP relaxation methods can be regarded as two points on an axis ofincreasing relaxation performance. We introduce two alternative methods that lie some-where in between these two relaxations. Unlike standard SDP solvers that suffer frompoor time complexity, they can still handle large scale problems. The two methods arebased on a subgradient optimization scheme. We show good performance on a number ofproblems. Experimental results are given on the folowing problems: segmentation withprior information, binary restoration, partitioning and subgraph matching. Our maincontributions are:

• An efficient algorithm for solving binary SDP problems with quadratic objectivefunction based on subgradient optimization is developed. In addition, we showhow to incorporate linear constraints in the same program.

• The trust region subproblem is introduced and we modify it to in order to be ap-plicable to binary quadratic problems with a linear term in the objective function.

Many of the application problems mentioned above are known to be NP-hard, soin practice they cannot be solved optimally. Thus one is forced to rely on approximatemethods which results in sub-optimal solutions. Certain energy (or objective) functionalsmay be solved in polynomial time, for example, submodular functionals using graph cuts[11], but this is not the topic of the present paper.

In [8], an alternative (and independent) method is derived which is also based onsubgradients, called the spectral bundle method. Our subgradient method differs from [8]in that it is simpler (just look for an ascent direction) and we have found empiricallyon the experimental problems (see Section 5) that our method performs equally well (orbetter). An in-depth comparison of the two alternatives is, however, beyond the scope ofthis paper.

1.1 Outline

The outline of this paper is as follows: In the next section we present the problem andsome existing approximation techniques for obtaining approximate solutions.

In section 3 we present our algorithm. We develop theory for improving the spectralrelaxation, by using the notion of subgradients. Subgradients is a generalization of gradi-ents that is used when a function is not differentiable. We show that for our problem thesubgradients can be calculated analytically, to determine ascent directions, to be used inan ascending direction scheme.

In section 4 we study an the Trust Region Subproblem, which is interesting specialcase in which we only try to enforce the binary constraints on one of the variables. This

98

2. BACKGROUND

has been extensively studied in the optimization literature and we show that it is alwayspossible to solve exactly.

Finally we test our algorithms and compare with existing methods on the follow-ing problems: segmentation with prior information, binary restoration, partitioning andsubgraph matching. Preliminary results of this work was presented in [15] and [5].

2 Background

In this paper we study different ways to find approximate solutions of the following binaryquadratic problem:

z = inf yTAy + bT y, y ∈ {−1, 1}n (1)

whereA is an n×n (possibly indefinite) matrix. A common approach for approximatingthis highly nonconvex problem is to solve the relaxed problem:

zsp = inf||x||2=n+1

xTLx (2)

where

x =

(y

yn+1

)

, L =

(A 1

2b12bT 0

)

.

Solving (2) amounts to finding the eigenvector corresponding to the algebraically smallesteigenvalue of L. Therefore we will refer to this problem as the spectral relaxation of (1).The benefits of using this formulation is that eigenvalue problems of this type are wellstudied and there exist solvers that are able to efficiently exploit sparsity in the matrix L,resulting in fast execution times. A significant weakness of this formulation is that theconstraints y ∈ {−1, 1}n and yn+1 = 1 are relaxed to ||x||2 = n + 1, which oftenresults in poor approximations.

Now let us turn our attention to bounds obtained through semidefinite program-ming. Using Lagrange multipliers σ = [σ1, . . . , σn+1 ]T for each binary constraintx2i − 1 = 0, one obtains the following relaxation of (1)

supσ

infxxT (L+ diag(σ))x − eTσ. (3)

Here e is an (n+1)-vector of ones. The inner minimization is finite valued if and only if(L+ diag(σ)) is positive semidefinite, that is, L+ diag(σ) � 0. This gives the followingequivalent relaxation:

zd = infσeTσ, L+ diag(σ) � 0. (4)

We will denote this problem the dual semidefinite problem since it is dual to the followingproblem (see [3, 10]):

zp = infX�0

tr(LX), diag(X) = I, (5)

99

PAPER IV

where X denotes a (n+1)× (n+1) matrix. Consequently we will call this problem theprimal semidefinite program. Since the dual problems (4) and (5) are convex, there is ingeneral no duality gap. In [10], the proposed method is to solve (5) and use randomizedhyperplanes (see [7]) to determine an approximate solution to (1). This method has anumber of advantages. Most significantly, using a result from [7] one can derive boundson the expected value of the relaxed solution. It is demonstrated that the approach workswell on a number of computer vision problems. On the other hand, solving this relaxationis computationally expensive. Note that the number of variables is O(n2) for the primalproblem (5) while the original problem (1) only has n variables.

3 A Spectral Subgradient Method

In this section we present a new method for solving the binary quadratic problem (1).Instead of using semidefinite programming we propose to solve the (relaxed) problem

zsg = supσ

inf||x||2=n+1

xT (L+ diag(σ))x − eTσ, (6)

with steepest ascent. At a first glance it looks as though the optimum value of this problemis greater than that of (3) since we have restricted the set of feasible x. However it is shownin [16] that (3), (5) and (6) are in fact all equivalent. The reason for adding the normcondition to (6) is that for a fixed σ we can solve the inner minimization by finding thesmallest eigenvalue.

3.1 Differentiating the objective function

Let

L(x, σ) = xT (L+ diag(σ))x − eTσ (7)

f(σ) = inf||x||2=n+1

L(x, σ). (8)

Since f is a pointwise infimum of functions linear in σ it is easy to see that f is a concavefunction. Hence our problem is a concave maximization problem. Equivalently, f can bewritten as

f(σ) = (n+ 1)λmin(L + diag(σ)) − eTσ. (9)

Here λmin(·) denotes the smallest eigenvalue of the entering matrix. It is widely knownthat the eigenvalues are analytic (and thereby differentiable) functions everywhere as longas they are distinct. To be able to use a steepest ascent method we need to considersubgradients as eigenvalues will cross during the optimization. Recall the definition of asubgradient [1].

100

3. A SPECTRAL SUBGRADIENT METHOD

Definition 3.1. If f : Rn+1 7→ R is concave, then ξ ∈ R

n+1 is a subgradient to f atσ0 if

f(σ) ≤ f(σ0) + ξT (σ − σ0), ∀σ ∈ Rn+1. (10)

Figure 1 shows a geometrical interpretation of (10). Note that if f is differentiable atσ0, then letting ξ be the gradient of f turns the right hand side of (10) into the tangentplane. One can show that if a function is differentiable then the gradient is the onlyvector satisfying (10). If f is not differentiable at σ0 then there are several subgradientssatisfying (10).

We will denote the set of all subgradients at a point σ0 by ∂f(σ0). From (10) it iseasy to see that this set is convex and if 0 ∈ ∂f(σ0) then σ0 is a global maximum.

0σ 0σ

Figure 1: Geometric interpretation of the definition of subgradients. Left: When thefunction is differentiable in σ0 the only possible right hand side in (10) is the tangentplane. Right: When the function is not differentiable there are several planes fullfilling(10), each one giving rise to a subgradient.

Next we show how to calculate the subgradients of our problem. Let x2 be the vectorcontaining the entries of x squared. Then we have:

Lemma 3.1. If x is an eigenvector corresponding to the minimal eigenvalue of L + diag(σ)with norm ||x||2 = n+ 1 then ξ = x2 − e is a subgradient of f at σ.

Proof. If x is an eigenvector corresponding to the minimal eigenvalue of L + diag(σ)then x solves

inf||x||2=n+1

L(x, σ). (11)

Assume that x solves

inf||x||2=n+1

L(x, σ) (12)

101

PAPER IV

then

f(σ) = xT (L + diag(σ))x− eT σ

≤ xT (L + diag(σ))x− eT σ

= f(σ) + xT diag(σ − σ)x− eT (σ − σ)

= f(σ) +∑

i

(σi − σi)(x2i − 1)

= f(σ) + ξT (σ − σ).

The inequality comes from the fact that x solves (12). ⊓⊔

The result above is actually a special case of a more general result given in [1] (Theo-rem 6.3.4). Next we state three corollaries obtained from [1] (Theorems 6.3.7, 6.3.6 and6.3.11). The first one gives a characterization of all subgradients.

Corollary 3.2. Let E(σ) be the set of all eigenvectors with norm√n+ 1 corresponding to

the minimal eigenvalue of L+ diag(σ). Then the set of all subgradients of f at σ is given by

∂f(σ) = convhull({x2 − e; x ∈ E(σ)}). (13)

We do not give the proof here but note that the inclusion ∂f(σ) ⊇ convhull({x2 −e; x ∈ E(σ)}) is obvious by Lemma 3.1 and the fact that ∂f(σ) is a convex set.

Corollary 3.3. Let E(σ) be the set of all eigenvectors with norm√n+ 1 corresponding to

the minimal eigenvalue of L+ diag(σ). Then

f ′(σ, d) = infξ∈∂f(σ)

dT ξ = infx∈E(σ)

dT (x2 − e). (14)

Here f ′(σ, d) is the directional derivative in the direction d or formally

f ′(σ, d) = limt→0+

f(σ + td) − f(σ)

t. (15)

The first equality is proven in [1]. The second equality follows from Corollary 3.2 andthe fact that the objective function dT ξ is linear in ξ. For a linear (concave) function theoptimum is always attained in an extreme point. From [1] we also obtain the following

Corollary 3.4. The direction d of steepest ascent at σ0 is given by

d =

{0 if ξ = 0ξ

||ξ|| if ξ 6= 0(16)

where ξ ∈ ∂f(σ0) is the subgradient with smallest norm.

102

3. A SPECTRAL SUBGRADIENT METHOD

We will use subgradients in a similar way as gradients is used in a steepest ascentalgorithm. Event though there may be many subgradients to choose between, corollary3.4 finds the locally best one. Figure 2 shows the level sets of a function its subgradients attwo points. To the left the function is differentiable at σ0 and hence the only subgradientis the gradient which points in the direction of steepest ascent. To the right there areseveral subgradients and the one with the smallest norm points in the direction of steepestascent.

σ0

σ0

Figure 2: The levelsets of a function and its subgradients at two points. Left: f is differ-entiable at σ0 and hence the gradients points in the direction of steepest ascent. Right: fis non differentiable at σ0 and the direction of steepest ascent is given by the subgradientwith the smallest norm.

3.2 Implementation

The basic idea is to find an ascending direction and then to solve an approximation off(σ) along this direction. This process is then repeated until a good solution is found.

3.2.1 Finding ascent directions

The first step is to find an ascending direction. We use Corollary 3.2 to find a gooddirection. A vector x ∈ E(σ) can be written

x =∑

i

λixi,∑

i

λ2i = 1, (17)

where {xi} is an orthogonal base of the eigenspace corresponding to the smallest eigen-value (with ||xi||2 = n+1). For the full subgradient set we need to calculate x2−e for allpossible values of λ in (17). In practice, we are led to an approximation and empiricallywe have found that it is sufficient to pick the vectors x2

i − e and use the convex envelopeof these vectors as our approximation. Let S be our approximating set. To determine thebest direction, the vector of minimum norm in S needs to be found. The search can be

103

PAPER IV

written as

infξ∈S

||ξ||2 = inf ||∑

k

µkx2k − e||2,

∑

k

µk = 1, µk ≥ 0, (18)

which is a convex quadratic program in µk that can be solved efficiently. To test if anascending direction d is actually obtained, we use Corollary 3.3 to calculate the directionalderivative. In fact we can solve the optimization problem (14) efficiently by using theparameterization (17), which results in

inf dT

(

(∑

i

λixi)2 − e

)

,∑

i

λ2i = 1. (19)

This is a quadratic function in λ with a norm constraint which can be solved by calculat-ing eigenvalues. If d is not an ascent direction then we add more vectors to the set S toimprove the approximation. In this way we either find an ascending direction or we findthat zero is a subgradient, meaning that we have reached the global maximum.

3.2.2 Approximating f along a direction

The next step is to find an approximation f of the objective function along a givendirection. We do this by restricting the set of feasible x to a set X consisting of a few ofthe eigenvectors corresponding to the lowest eigenvalues of L + diag(σ). The intuitionbehind this choice for X is that if the eigenvalue λi is distinct then x2

i − e is in fact thegradient of the function

(n+ 1)λi(L+ diag(σ)) − eTσ, (20)

where λi(·) is the ith smallest eigenvalue as a function of a matrix. The expression

fi(t) = xTi (L + diag(σ + td))xi − eT (σ + td) (21)

is then a Taylor expansion around σ in the direction d. The function f1 approximates fwell in neighborhood around t = 0 if the smallest eigenvalue does not cross any othereigenvalue. If it does then one can expect that there is some i such that inf(f1(σ), fi(σ))is a good approximation.

This gives us a function f of the type

f(σ) = infxi∈X

xTi (L+ diag(σ + td))xi − eT (σ + td). (22)

To optimize this function we can solve the linear program

maxt,f ff ≤ xTi (L+ diag(σ + td))xi − eT (σ + td)∀xi ∈ X, t ≤ tmax.

(23)

104

4. THE TRUST REGION PROBLEM

The parameter tmax is used to express the interval for which the approximation is valid.The program gives a value for t and thereby a new σ = σ + td. In general, f(σ) isgreater than f(σ), but if the approximation is not good enough, one needs to improve theapproximating function. This can be accomplished by making a new Taylor expansionaround the point σ and incorporate these terms to our approximation and repeat theprocess. Figure 3 shows two examples of the objective function f and its approximatingfunction f .

−1 −0.5 0 0.5 1−2.5

−2

−1.5

−1

−0.5

0

0.5x 10

4

−1 −0.5 0 0.5 1−1200

−1000

−800

−600

−400

−200

Figure 3: Two approximations of the objective function f(σ + td) along an ascent di-rection d. The dashed line is the true objective f function and the solid line is theapproximation f .

4 The Trust Region Problem

Another interesting relaxation of our original problem is obtained if we add the additionalconstraint yn+1 = 1 to (2). We then obtain a the following relaxation:

ztr = inf||y||2=n

yTAy + bT y. (24)

We propose to use this relaxation instead of the spectral relaxation (2). Since the objectivefunction is the same as for the spectral relaxation with yn+1 = 1 it is obvious that

zsp ≤ ztr (25)

holds. Equality will only occur if the solution to zsp happens to have ±1 as its lastcomponent. This is generally not the case. In fact, empirically we have found that thelast component is often farther away from ±1 than the rest of the components. So byenforcing the constraint, that is, solving (24) often yields much better solutions.

Next we will show that it is possible to solve (24) exactly. A problem closely relatedto (24) is

inf||y||2≤n

yTAy + bT y. (26)

105

PAPER IV

This problem is usually referred to as the trust region subproblem. Solving the problemis one step in a general optimization scheme for descent minimization and it is known asthe trust region method [6]. Instead of minimizing a general function, one approximatesit with a second order polynomial yTAy+ bT y+ c. A constraint of the type ||y||2 ≤ m

then specifies the set in which the approximation is believed to be good (the trust region).The trust region subproblem have been studied extensively in the optimization liter-

ature ([18, 19, 23, 22, 17]). A remarkable property of this problem is that, even thoughit is non convex, there is no duality gap (see [3]). In fact, this is always the case when wehave quadratic objective function and only one quadratic constraint. The dual problemof (26) is

supλ≤0

infyyTAy + bT y + λ(n− yT y). (27)

In [22] is shown that y∗ is the global optimum of (26) if and only if (y∗, λ∗) is feasiblein (27) and fulfills the following system of equations:

(A− λ∗I)y∗ = −1

2b (28)

λ∗(n− yT y) = 0 (29)

A− λ∗I � 0. (30)

The first two equations are the KKT conditions for a local minimum, while the thirddetermines the global minimum. From equation (30) it is easy to see that if A is notpositive semidefinite, then λ∗ will not be zero. Equation (29) then tells us that ||y||2 = n.This shows that for an A that is not positive semidefinite problems (24) and (26) areequivalent. Note that we may always assume that A is not positive semidefinite in (24).This is because we may always subtract mI form A since we have the constant normcondition. Thus replacing A with A−mI for sufficiently largem gives us an equivalentproblem with A not positive definite.

A number of methods for solving this problem has been proposed. In [17] semidefi-nite programming is used to optimize the function nk(λmin(H(t)) − t), where

H(t) =

(A 1

2b12bT t

)

, (31)

and λmin is the algebraically smallest eigenvalue. In [13] the authors solve 1ψ(λ) − 1√

n= 0

where ψ(λ) = ||(A−λI)−1 12b||. This is a rational function with poles at the eigenvalues

of A. To ensure that thatA−λI is positive semidefinite a Cholesky factorization is com-puted. If one can afford this, Cholesky factorization is the preferred choice of method.However, the LSTRS-algorithm developed in [18] and [19] is more efficient for largescale problems. LSTRS works by solving a parameterized eigenvalue problem. It searchesfor a t such that the eigenvalue problem

(A 1

2b12bT t

)(y

1

)

= λmin

(y

1

)

(32)

106

5. APPLICATIONS

or equivalently

(A− λminI)y = −1

2b

t− λmin = −1

2bT y (33)

has a solution. Finding this t is done by determining a λ such that φ′(λ) = n, where φis defined by

φ(λ) =1

4bT (A− λI)†b = −1

2bT y. (34)

It can be shown that λ gives a solution to (33). Since φ is a rational function withpoles at the eigenvalues of A, it can therefore be expensive to compute. Instead rationalinterpolation is used to efficiently determine λ. For further details see [18] and [19].

5 Applications

In this section we evaluate the performance of our methods for a few different appli-cations that can be solved as binary quadratic problems. The algorithms are comparedwith spectral relaxations using Matlab’s sparse eigenvalue solver, SDP relaxations using Se-DuMi [24] and the spectral bundle algorithm developed by Helmberg [8]. Our spectralsubgradient algorithm is implemented in Matlab and the trust region algorithm is basedon LSTRS [18] (also Matlab). Note that our implementations consist of simple matlabscripts while the other software has implementations in C (and often highly optimizedfor speed).

5.1 Segmentation with Prior Information

In our first exampe we will compare the trust region method to the spectral relaxation.We will see that the spectral relaxation can result in poor segmentations when the extravariable is not ±1. To evaluate the two methods we consider a simple multiclass segmen-tation problem with prior information.

5.1.1 Graph Representations of Images

The general approach of constructing an undirected graph from an image is shownin 5.1.1. Basically each pixel in the image is viewed as a node in a graph. Edges areformed between nodes with weights corresponding to how alike two pixels are, givensome measure of similarity, as well as the distance between them. In an attempt to reducethe number of edges in the graph, only pixels within a small, predetermined neighbor-hood N of each other are considered. Cuts made in such a graph will then correspond toa segmentation of the underlying image.

107

PAPER IV

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

Figure 4: Graph representation of a 3 × 3 image.

5.1.2 Including Prior Information

To be able to include prior information into the visual grouping process we modify theconstruction of the graphs in the following way. To the graph G we add k artificialnodes. These nodes do not correspond to any pixels in the image, instead they are meantto represent the k different classes the image is to be partitioned into. The contextualinformation that we wish to incorporate is modeled by a simple statistical model. Edgesbetween the class nodes and the images nodes are added, with weights proportional tohow likely a particular pixel is to a certain class. With the labeling of the k class nodesfixed, a minimal cut on such a graph should group together pixels according to their classlikelihood and still preserving the spatial structure, see 5.

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

(a) Originalimage

1 2 3

(b) Correspondinggraph

31 2

(c) Multiclass min-cut

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

(d) Resultingsegmentation

Figure 5: A graph representation of an image and an example three-class segmentation.Unnumbered nodes corresponds to pixels and numbered ones to the artificial class nodes.

5.1.3 Combinatorial Optimization

Next we show how to approxiamte this problem using the spectral method and the trustregion method. Let Z = [ z1, . . . , zk ] ∈ {−1, 1}n×k denote the n × k assignmentmatrix for all the n nodes. A 1 in row i of column j signifies that pixel i of the imagebelongs to class j, and of course −1 in the same position signifies the opposite. If welet W contain the inter-pixel affinities, the min-cut (without pixel class probabilities) can

108

5. APPLICATIONS

then be written

Cmin = infZ

k∑

i=1

∑

u∈Ai

v/∈Ai

wuv = infZ

∑

i,j,l

wjl(zij − zil)2 = inf

Z

k∑

i=1

zTi (D −W )zi. (35)

Here D denotes diag(W1). The assignment matrix Z must satisfy Z1 = (2 − k)1. Inaddition, if the pixel/class-node affinities P = [ p1, . . . , pk ] (that is, the probabilities of asingle pixel belonging to a certain class) are included and also the labels of the class-nodesare fixated, we get

Cmin = infZ∈{−1,1}n×k

Z1=(2−k)1

k∑

i=1

zTi (D −W )︸︷︷︸

L

zi − 2pTi zi = infZ∈{−1,1}n×k

Z1=(2−k)1

tr(ZTLZ

)+

+2 [−pT1 , . . . ,−pTk ]︸︷︷︸

bT

z1...zk

︸︷︷︸

z

= infz∈{−1,1}nk

Z1=(2−k)1

zT

L 0 . . . 00 L . . . 0

0...

. . . 0

0... 0 L

︸︷︷︸

A

z + 2bT z. (36)

As z ∈ {−1, 1}nk ⇔ z2i = 1, we can write

µ = infz zTAz + 2bT z (37)

s.t. z2i = 1 (38)

Z1 = (2 − k)1. (39)

The linear subspace of solutions to Z1 = (2−k) can be parametrized as z = Qy+v,where Q and v can be chosen so that QTQ = I and QT v = 0 and y ∈ R

n(k−1) Withthis change in coordinates and by replacing the discrete constraint z2

i = 1 with zT z = nk

we arrive at the following relaxed quadratically constrained quadratic program

µ = infy(Qy + v)TA(Qy + v) + 2bT (Qy + v)

s.t. zT z = (Qy + v)T (Qy + v) = nk (40)

For efficiently solving this problem we here turn our attention to two relaxations thatare tractable from a computational perspective. Simplifying (40), we obtain an equivalenttrust region problem an the form

µtr = inf||y||2=nk−vT v

yT Ay + 2bT y (41)

By adding an extra variable yn(k−1)+1 as in (2) we obtain the spectral relaxation.

109

PAPER IV

5.1.4 Experimental Results

As mentioned in the previous section prior knowledge is incorporated into the graph cutframework through the k artificial nodes. For this purpose we need a way to describeeach pixel as well as model the probability of that pixel belonging to a certain class.

The image descriptor in the current implementation is based on color alone. Eachpixel is simply represented by their three RGB color channels. The probability distribu-tion for these descriptors are modeled using a Gaussian Mixture Model (GMM).

p(v|Σ, µ) =k∑

i=1

1√

2π|Σi|e(−

12 (v−µi)

T Σ−1i

(v−µi)) (42)

From a number of manually annotated training images the GMM parameters are thenfitted through Expectation Maximization, [4]. This fitting is only carried out once andcan be viewed as the learning phase of our proposed method.

The edge weight between pixel i and j and the weights between pixel i and thedifferent class-nodes are given by

wij = e(− r(i,j)

σR)e(− ||s(i)−s(j)||2

σW)

(43)

pki = αp(w(i)|i∈k)

P

j p(w(i)|i∈j) . (44)

Here || · || denotes the euclidian norm, r(i, j) the distance between pixel i and j.The tuning parameters λ, σR and σW weights the importance of the different features.Hence, wij contains the inter-pixel similarity, that ensures that the segmentation morecoherent. pi describes how likely a pixel is to belong to class k and α is a parameterweighting the importance of spatial structure vs. class probability.

Preliminary tests of the suggested approach were carried out on a limited number ofimages. We chose to segment the images into four simple classes, sky, grass, brick andbackground. Gaussian mixture models for each of these classes were firstly acquired from ahandful of training images manually chosen as being representative of such image regions,see figure 6. For an unseen image the pixel affinity matrix W and class probabilitieswere computed according to (43) and (44). The resulting optimization program was thensolved using both the spectral relaxation and the trust region subproblem method. Theoutcome can be seen in fig. 7. Parameters used in these experiments were σR = 1,σW = 1, α = 10 and N a 9 × 9 neighborhood structure.

Both relaxations produce visually relevant segmentations, based on very limited train-ing data our proposed approach does appear to use the prior information in a meaningfulway. Taking a closer look at the solutions supplied by the trust region method and thespectral relaxation, for these two examples, does however reveal one substantial difference.The spectral relaxation was reached by ignoring the constraint on the homogenized coor-dinate yn(k−1)+1 = 1. The solutions to the examples in fig. 7 produces an homogeneous

110

5. APPLICATIONS

Figure 6: Sample training images.

coordinate value of yn(k−1)+1 ≈ 120, in both cases. As the class probabilities of the pix-els are represented by the linear part of eq. 37, the spectral relaxation, in these two cases,thus yields an image partition that that weights prior information much higher than spa-tial coherence. Any spatial structure of an image will thus not be preserved, the spectralrelaxation is basically just a maximum-likelihood classification of each pixel individually.

5.2 Binary Restoration

As a test problem (which can be solved exactly by other means), we first consider theproblem of separating a signal from noise. The signal {xi}, i = 1, ..., n is assumed totake the values ±1. Normally distributed noise with mean 0 and variation 0.6 is thenadded to obtain a noisy signal {si}, i = 1, ..., n. Figure 8 (a) and (b) graphs the originalsignal and the noisy signal respectively for n = 400. A strategy to recover the originalsignal is to minimize the following objective function:

∑

i

(xi − si)2 + µ

∑

i

∑

j∈N(i)

(xi − xj)2, xi ∈ {−1, 1}. (45)

Here N(i) means a neighborhood of i, in this case {i − 1, i + 1}. By adding the (ho-mogenization) variable xn+1, the problem can be transformed to the same form as in(6). Table 1 shows the execution times and Table 2 displays the obtained estimatesfor different n. For the subgradient method, 10 iterations were run and in each iteration,the 15 smallest eigenvectors were computed for the approximation set S in (18). Note inparticular the growth rate of the execution times for the SDP. Figure 8 (b) - (d) shows thecomputed signals for the different methods when n = 400. The results for other valuesof n have similar appearance. The spectral relaxations behave (reasonably) well for this

111

PAPER IV

Original images.

(TSP) Resulting class labelling.

(SR) Resulting class labelling.

Figure 7: Example segmentation/classification of an image using both Trust Region Sub-problem (TSP) formulation and Spectral Relaxation (SR).

112

5. APPLICATIONS

n Spectral Trust region Subgradient SDP

100 0.33 0.60 4.21 3.81

200 0.30 0.62 6.25 13.4400 0.32 0.68 6.70 180

600 0.33 0.80 10.7 637

800 0.49 1.40 10.1 2365

1000 0.37 1.85 15.2 4830

Table 1: Execution times in seconds for the signal problem.

n Spectral Trust region Subgradient SDP

100 24.3 31.6 40.6 53.1

200 27.4 40.5 53.5 76.1

400 74.9 88.4 139 174

600 134 164 240 309

800 169 207 282 373

1000 178 229 322 439

Table 2: Objective values of the relaxations. A higher value means a better lower boundfor the (unknown) optimal value.

problem as the estimated value of xn+1 happens to be close to ±1. Next we considera similar problem as above, which was also a test problem in [10]. We want to restore themap of Iceland given in Figure 9. The objective function is the same as in (45), exceptthat the neighborhood of a pixel is defined to be all its four neighboring pixels. The sizeof the image is 78×104, which yields a program with 78 ·104+1 = 8113 variables. Re-call that the semidefinite primal program will contain 81132 = 65820769 variables andtherefore we have not been able to compute a solution with SeDuMi. In [10], a differentSDP solver was used and the execution time was 64885s. Instead we compare with thespectral bundle algorithm [8]. Table 3 gives the execution times and the objective valuesof the estimations. Figure 10 shows the resulting restorations for the different methods.For the subgradient algorithm, the 4 smallest eigenvalues were used in (18). Even thoughthe spectral relaxation results in a slightly lower objective value than the trust region, therestoration looks just as good. Here the last component of the eigenvector is 0.85 whichexplains the similarity of these two restorations. The subgradient method yields a solu-tion with values closer to ±1 as expected. Recall that there is a duality gap which meansthat the optimal solution will not attain xi = ±1 for all i in general. The spectral bundlemethod provides a solution where some pixel values are much larger than 1. In orderto make the difference between pixels with values −1 and 1 visible in Figure 10(d) we

113

PAPER IV

0 100 200 300 400−2

0

2(a)

0 100 200 300 400−2

0

2(b)

0 100 200 300 400−2

0

2(c)

0 100 200 300 400−2

0

2(d)

0 100 200 300 400−2

0

2(e)

0 100 200 300 400−2

0

2(f)

Figure 8: Computed solutions for the signal problem with n = 400. (a) Original signal,(b) signal + noise, (c) solution obtained using spectral relaxations, (d) trust region, (e)subgradient algorithm and (f ) dual semidefinite program.

had to replace these pixel values with a smaller value. This results in the white areas inFigure 10(d) and the bar close to the value 2 in Figure 10(d).

5.3 Partitioning

In this section we consider the problem of partitioning an image into perceptionally dif-ferent parts. Figure 11 (a) shows the image that is to be partitioned. Here we want toseparate the buildings from the sky. To do this we use the following regularization term

∑

ij

wij(xi − xj)2. (46)

The weights wij are of the type

wij = e− (RGB(i)−RGB(j))2

σRGB e−d(i,j)2

σd , (47)

where RGB(i) denotes the RGB value of pixel i and d(i, j) denotes the distance betweenpixels i and j. To avoid solutions where all pixels are put in the same partition, and tofavour balanced partitions, a term penalizing unbalanced solutions is added. If one adds

114

5. APPLICATIONS

Figure 9: Map of Iceland corrupted by noise.

Method Time (s) Lower bound

Spectral 0.48 -1920

Trust region 2.69 -1760

Subgradient, 10 iter. 74.6 -453

Bundle, 5 iter. 150.4 -493

Table 3: Execution times and objective values of the computed lower bounds for theIceland image.

the constraint eTx = 0 (as in [10]) or equivalently xT eeTx = 0 we will get partitions ofexactly equal size (at least for the subgradient method). Instead we add a penalty term tothe objective function yielding a problem of the type

inf xT (L + µeeT )x, xi ∈ {−1, 1}. (48)

Observe that this problem is not submodular [11]. Since the size of the skyline image(Figure 11(a)) is 35 × 55 we obtain a dense matrix of size 1925 × 1925. However, be-cause of the structure of the matrix it is easy to calculate (L + µeeT )x which is all thatis needed to employ power iteration type procedures to calculate eigensystems. This type

Method Time (s)

Subgradient, 4 iter. 209


Normalized Cuts 5.5

Table 4: Computing times for the skyline image.

115

PAPER IV

−2 0 20

500

(i)

(a)

(e)

−2 0 20

500

(j)

(b)

(f)

−2 0 20

500

(k)

(c)

(g)

−2 0 20

500

(l)

(d)

(h)

Figure 10: Top row: relaxed soutions. Middle: threshholded solutions. Bottom: his-togram of the estimated pixel values. (a),(e),(i): spectral method, (b),(f ),(j): trust region,(c),(g),(k): subgradient, 10 iterations, (d),(h),(l): Helmberg’s bundle method, 5 iterations.

of matrices are not supported in the spectral bundle software, so we cannot compare withthis method. Also, the problem is too large for SeDuMi and there is no point in runningthe trust region method on this problem since the matrix L has not been homogenized.Figure 11 (b) shows the resulting partition. Figures 11 (e),(f ) give the relaxed solutionsafter 4 and 7 iterations, respectively, of the subgradient algorithm. Both relaxed solutionsyield the same result when thresholded at zero. As a comparison, we have included thepartitionings obtained from Normalized Cuts [21] which is a frequently applied methodfor segmentation. The reason for the strange partitioning in Figures 11(c),(d) is thatthe Fiedler vector in Normalized Cuts essentially contains values close to −0.3 and 3.3and the median is also close to −0.3. Table 4 shows the computing times of the differ-ent methods. Note that the convergence of the subgradient method here is slower thanpreviously, this is because the eigenvalue calculations is more demanding for (L+µeeT ).

116

5. APPLICATIONS

(a) (b)

(c) (d)

(e) (f)

Figure 11: (a) Original image, (b) thresholded segmentation with 7 iterations of thesubgradient algorithm (white pixels correspond to one class, remaining pixels are in theother class) (c) Fiedlervector thresholded at the median, (d) Fiedlervector thresholdedat the mean, (e),(f ) relaxed (untruncated) solutions obtained with 4 and 7 iterations,respectively, of the subgradient algorithm.

5.4 Registration

In our final experiments we consider the registration problem. It appears as a subproblemin many vision applications and similar formulations as the one we propose here haveappeared in [2, 20, 25].

Suppose we are given a set of m source points that should be registered to a set ofn target points, where m < n. Let xij denote a binary (0, 1)-variable which is 1 whensource point i is matched to target point j, otherwise 0. As objective function, we choosethe quadratic function

∑

wijklxijxkl, (49)

and set wijkl = −1 if the coordinates of the source points si, sk are consistent with thecoordinates of the target points tj , tl, otherwise wijkl = 0. Two correspondence pairsare considered to be consistent if the distances are approximately the same between sourceand target pairs, that is,

abs(||si − sk|| − ||tj − tl||) < θ, (50)

for some threshold θ. Each source point is a priori equally likely to be matched to any ofthe target points and hence there is no linear term in the objective function. In addition,

117

PAPER IV

−1 0 1

−1

−0.5

0

0.5

1

(a)

−0.5 0 0.5 1 1.5

0

0.5

1

1.5

2

(b)

Figure 12: One random example for the registration problem: (a) Target points n = 60and (b) source points m = 15.

each source point should be mapped to one of the target points and hence∑

j xij = 1for all i. Also, two source points cannot be mapped to the same target point. Thiscan be specified by introducing (0, 1)-slack variables xm+1,j for j = 1, . . . , n and the

constraints∑

j xm+1,j = n−m as well as∑m+1

i=1 xij = 1 for all j.

By substituting xij =zij+1

2 , the problem is turned into a standard (−1, 1)-problem,but now with linear equality constraints. In the case of the trust region method we maypenalize deviations from the linear constraints by adding penalties of the type µ(

∑

j xij−1)2 to the objective function. One could do the same in the case of the subgradientalgorithm, however, in this case the penalties have to be homogenized and may thereforenot be as effective as for the trust region method. Instead Lagrange multipliers of thetype σk(

∑

j xij)2 −σk are introduced. These multipliers can then be handled in exactly

the same way as the constraints x2ij − 1 = 0. Each constraint gives a new entry in the

subgradient vector which is updated in the same way as before.

Method Time (s)

Trust region 1.9

Subgradient, 7 iter. 43.5


SDP 6867

Table 5: The registration problem with m = 15, n = 60.

We have tested the formulation on random data of various sizes. First, coordinatesfor the n target points are randomly generated with a uniform distribution, then we

118

6. CONCLUSIONS

0 500 1000

−1

0

1

(a)

0 500 1000

−1

0

1

(b)

0 500 1000

−1

0

1

(c)

0 500 1000

−1

0

1

(d)

Figure 13: Computed solutions z = [ z11, z12, . . . , zm+1,n ] for the registration prob-lem using (a) the trust region method, (b) the subgradient method, 7 iterations, (c) thesubgradient method, 15 iterations, and (d) SDP with SeDuMi, cf. Figure 12.

randomly selected m source points out of the target points, added noise and applied arandom Euclidean motion. Figures 12 (a),(b) show the target and source points for oneexample with m = 15 and n = 60. The threshold θ is set to 0.1. The untruncated(vectorized) solutions for zij are plotted in Figure 13 and the resulting registration forthe subgradient method is shown in Figure 14. The standard spectral relaxation for thisproblem works rather poorly as the last entry zn+1 is in general far from one. Thecomputing times are given in Table 5. Note that this example has approximately fourtimes as many decision variables as the largest problems dealt with in [20, 25]. For moreinformation on the quality of SDP relaxations for this problem, the reader is also referredto the same papers.

6 Conclusions

We have shown how large scale binary problems with quadratic objectives can be solvedby taking advantage of the spectral properties of such problems. The approximationgap compared to traditional spectral relaxations is considerably smaller, especially, for the

119

PAPER IV

−1 −0.5 0 0.5 1

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Figure 14: Registration of the source points to their corresponding target points, cf. Fig-ure 12.

subgradient method. Compared to standard SDP relaxations, the computational effort isless demanding, in particular, for the trust region method. Future work includes to applythe two methods to more problems that can be formulated within the same frameworkand to make an in-depth experimental comparisons. It would also be interesting to seehow the proposed methods behave in a branch-and-bound algorithm for obtaining moreaccurate estimates.

120

Bibliography

[1] Shetty. Bazaraa, Sherali. Nonlinear Programming, Theory and Algorithms. Wiley,2006.

[2] A.C. Berg, T.L. Berg, and J. Malik. Shape matching and object recognition usinglow distortion correspondences. In Conf. Computer Vision and Pattern Recognition,pages 26–33, San Diego, USA, 2005.


[4] A. Dempster, M. Laird, and D. Rubin. Maximum likelihood from incomplete datavia the em algorithm. J. R. Stat. Soc., 1977.

[5] A.P. Eriksson, C. Olsson, and F. Kahl. Image segmentation with context. In Proc.Scandinavian Conference on Image Analysis, Aalborg, Denmark, 2007.

[6] R. Fletcher. Practical Methods of Optimization. John WIley & Sons, 1987.

[7] M.X Goemans and D.P Wiliamson. Improved approximation algorithms for max-imum cut and satisfiability problem using semidefinite programming. J.ACM,42(6):1115–1145, 1995.

[8] C. Helmberg and F. Rendl. A spectral bundle method for semidefinite program-ming. SIAM Journal on Optimization, 10(3):673–696, 2000.

[9] H. Zha J. Park and R. Kasturi. Spectral clustering for robust motion segmentation.In European Conf. Computer Vision, Prague, Czech Republic, 2004.

[10] J. Keuchel, C. Schnörr, C. Schellewald, and D Cremers. Binary partitioning, per-ceptual grouping, and restoration with semidefinite programming. IEEE Trans. onPattern Analysis and Machine Intelligence, 25(11):1364–1379, 2006.

[11] V. Kolmogorov and R. Zabih. What energy functions can be minimized via graphcuts? IEEE Trans. Pattern Analysis and Machine Intelligence, 26(2):147–159, 2004.

[12] A. Levin, A. Rav-Acha, and D. Lischinski. Spectral matting. In Proc. Conf. ComputerVision and Pattern Recognition, Minneapolis, USA, 2007.

[13] J.J. Moré and D.C. Sorensen. Computing a trust region step. SIAM J. Sci. Stat.Comput., 4(3):553–572, 1983.

121

[14] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and analgorithm. In Advances in Neural Information Processing Systems 14, 2002.

[15] C. Olsson, A.P. Eriksson, and F. Kahl. Solving large scale binary quadratic problems:Spectral methods vs. semidefinite programming. In Proc. Conf. Computer Vision andPattern Recognition, Minneapolis, USA, 2007.

[16] S. Poljak, F. Rendl, and H. Wolkowicz. A recipe for semidefinite relaxation for(0,1)-quadratic programming. Journal of Global Optimization, 7:51–73, 1995.

[17] F. Rendl and H. Wolkowicz. A semidefinite framework for trust region subproblemswith applications to large scale minimization. Math. Prog., 77(2 Ser.B):273–299,1997.

[18] M. Rojas, S.A. Santos, and D.C. Sorensen. A new matrix-free algorithm for thelarge-scale trust-region subproblem. SIAM Journal on optimization, 11(3):611–646,2000.

[19] M. Rojas, S.A. Santos, and D.C. Sorensen. Lstrs: Matlab software for large-scaletrust-region subproblems and regularization. Technical Report 2003-4, Departmentof Mathematics, Wake Forest University, 2003.

[20] C. Schellewald and C. Schnörr. Probabilistic subgraph matching based on convexrelaxation. In Proc. Int. Conf. on Energy Minimization Methods in Computer Visionand Pattern Recognition, pages 171–186, 2005.


[22] D.C. Sorensen. Newton’s method with a model trust region modification. SIAMJournal on Nomerical Analysis, 19(2):409–426, 1982.

[23] D.C Sorensen. Minimization of a large-scale quadratic fuction subject to a sphericalconstraint. SIAM J. Optim., 7(1):141–161, 1997.

[24] J.F. Sturm. Using SeDuMi 1.02, a Matlab toolbox for optimization over symmetriccones. Optimization Methods and Software, 11-12:625–653, 1999.

[25] P.H.S. Torr. Solving markov random fields using semi definite programming. InNinth International Workshop on Artificial Intelligence and Statistics, 2003.

[26] S. Umeyama. An eigendecomposition approach to weighted graph matching prob-lems. IEEE Trans. Pattern Anal. Mach. Intell., 10(5):695–703, 1988.

122

PAPER V


123

Normalized Cuts Revisited:A Reformulation for Segmentation with

Linear Grouping Constraints

Anders P. Eriksson, Carl Olsson and Fredrik Kahl

Abstract

Indisputably Normalized Cuts is one of the most popular segmentation algorithms incomputer vision. It has been applied to a wide range of segmentation tasks with greatsuccess. A number of extensions to this approach have also been proposed, ones thatcan deal with multiple classes or that can incorporate a priori information in the form ofgrouping constraints. However, what is common for all these suggested methods is thatthey are noticeably limited and can only address segmentation problems on a very specificform. In this paper, we present a reformulation of Normalized Cut segmentation that ina unified way can handle all types of linear equality constraints for an arbitrary numberof classes. This is done by restating the problem and showing how linear constraints canbe enforced exactly through duality. This allows us to add group priors, for example, thatcertain pixels should belong to a given class. In addition, it provides a principled wayto perform multi-class segmentation for tasks like interactive segmentation. The methodhas been tested on real data with convincing results.

1 Image Segmentation

Image segmentation can be defined as the task of partitioning an image into disjointsets. This visual grouping process is typically based on low-level cues such as intensity,homogeneity or image contours. Existing approaches include thresholding techniques,edge based methods and region-based methods. Extensions to this process includes theincorporation of grouping constraints into the segmentation process. For instance theclass labels for certain pixels might be supplied beforehand, through user interaction orsome completely automated process, [8, 3].

125

PAPER V

Currently the most successful and popular approaches for segmenting images arebased on graph cuts. Here the images are converted into undirected graphs with edgeweights between the pixels corresponding to some measure of similarity. The ambitionis that partitioning such a graph will preserve some of the spatial structure of the im-age itself. These graph methods based were made popular first through the NormalizedCut formulation of [9] and more recently by the energy minimization method of [2].This algorithm for optimizing objective functions that are submodular has the propertyof solving many discrete problems exactly. However, not all segmentation problems canbe formulated with submodular objective functions, nor is it possible to incorporate alltypes of linear constraints.

The work described here concerns the former approach, Normalized Cuts, the rele-vance of linear grouping constraints and how they can be included in this framework. It isnot the aim of this paper to argue the merits of one method, or cut metric, over another,nor do we here concern ourselves with how the actual grouping constraints are obtained.Instead we will show how through Lagrangian relaxation one in a unified can handle suchlinear constrains and also in what way they influence the resulting segmentation.

1.1 Problem Formulation

Consider an undirected graph G, with nodes V and edges E and where the non-negativeweights of each such edge is represented by an affinity matrix W , with only non-negativeentries and of full rank. A min-cut is the non-trivial subset A of V such that the sum ofedges between nodes in A and its complement is minimized, that is the minimizer of

cut(A, V ) =∑

i∈Aj∈V \A

wij (1)

This is perhaps the most commonly used method for splitting graphs and is a well knownproblem for which very efficient solvers exist. It has however been observed that thiscriterion has a tendency to produced unbalanced cuts, smaller partitions are preferred tolarger ones.

In an attempt to remedy this shortcoming, Normalized Cuts was introduced by [9].It is basically an altered criterion for partitioning graphs, applied to the problem of per-ceptual grouping in computer vision. By introducing a normalizing term into the cutmetric the bias towards undersized cuts is avoided. The Normalized Cut of a graph isdefined as:

Ncut =cut(A, V )

assoc(A, V )+

cut(B, V )

assoc(B, V )(2)

where A ∪ B = V , A ∩ B = ∅ and the normalizing term defined as assoc(A, V ) =∑

i∈A,j∈V wij . It is then shown in [9] that by relaxing (2) a continuous underestimator

126

2. NORMALIZED CUTS WITH GROUPING CONSTRAINTS

of the Normalized Cut can be efficiently computed. These techniques are then extendedin [11] beyond graph bipartitioning to include multiple segments, and even further in[12] to handle certain types of linear equality constraints.

One can argue that the drawbacks of this, the classical formulation, for solving theNormalized Cut are that firstly obtaining a discrete solution from the relaxed one canbe problematic. Especially in multiclass segmentation where the relaxed solution is notunique but consists of an entire subspace. Furthermore, the set of grouping constraintsis also very limited, only homogeneous linear equality constraints can be included in theexisting theory. We will show that this excludes many visually relevant constraints. In[4] an attempt is made at solving a similar problem with general linear constraints. Thisapproach does however effectively involve dropping any discrete constraint all together,leaving one to question the quality of the obtained solution.

2 Normalized Cuts with Grouping Constraints

In this section we propose a reformulation of the relaxation of Normalized Cuts thatin a unified way can handle all types of linear equality constraints for any number ofpartitions. First we show how we through duality theory reach the suggested relaxation.The following two sections then show why this formulation is well suited for dealing withgeneral linear constraints and how this proposed approach can be applied to multiclasssegmentation.

Starting off with (2), the definition of Normalized Cuts, the cost of partitioning animage with affinity matrix W into two disjoint sets, A and B, can be written as

Ncut =

∑

i∈Aj∈B

wij

∑i∈Aj∈V wij

+

∑

i∈Bj∈A

wij

∑

i∈Bj∈V

wij

. (3)

Let z ∈ {−1, 1}n be the class label vector, W the n × n-matrix with entries wij , d

the n × 1-vector containing the row sums of W , and D the diagonal n × n-matrix withd on the diagonal. A 1 is used to denote vectors of all ones. We can write (3) as

Ncut =P

i,jwij(zi−zj)

2

2P

i(zi+1)di+

P

i,jwij(zi−zj)

2

2P

i(zi−1)di=

= zT (D−W )z2dT (z+1) + zT (D−W )z

2dT (z−1) =

=(zT (D−W )z)dT 1

1T ddT 1−zT dT dT z=

(zT (D−W )z)dT 1

zT ((1T d)D−ddT )z (4)

In the last inequality we used the fact that 1T d = zT Dz. When we include general linearconstraints on z on the form Cz = b, C ∈ R

m×n, the optimization problem associated

127

PAPER V

with this partitioning cost becomes

infz

zT (D−W )zzT ((1T d)D−ddT )z

s.t. z ∈ {−1, 1}n

Cz = b. (5)

The above problem is a non-convex, NP-hard optimization problem. Therefore we areled to replace the z ∈ {−1, 1}n constraint with the norm constraint zT z = n. Thisgives us the relaxed problem

infz

zT (D−W )zzT ((1T d)D−ddT )z

s.t. zT z = n

Cz = b. (6)

This is also a non-convex problem, however we shall see in section 3 that we are ableto solve this problem exactly. Next we will write problem (6) in homogenized form, thereason for doing this will become clear later on. Let L and M be the (n + 1) × (n + 1)matrices

L =[

(D−W ) 00 0

], M =

[((1T d)D−ddT ) 0

0 0

]

, (7)

and

C = [C − b] (8)

the homogenized constraint matrix. The relaxed problem (6) can now be written

infz

[ zT 1 ]L[ z1 ]

[ zT 1 ]M[ z1 ]

s.t. zT z = n

C [ z1 ] = 0. (9)

Finally we add the artificial variable zn+1. Let z be the extended vector[zT zn+1

]T.

Throughout the paper we will write z when we consider the extended variables and justz when we consider the original variables. The relaxed problem (6) in its homogenizedform is

infz

zT LzzT Mz

s.t. z2n+1 − 1 = 0

zT z = n + 1

Cz = 0. (10)

128

3. LAGRANGIAN RELAXATION AND STRONG DUALITY

Note that the first constraint is equivalent to zn+1 = 1. If zn+1 = −1 then we maychange the sign of z to obtain a solution to our original problem.

The homogenized constraints Cz = 0 now form a linear subspace and can be elim-inated in the following way. Let N

Cbe a matrix where its columns form a base of the

nullspace of C . Let k + 1 be the dimension of the nullspace. Any z fulfilling Cz = 0can be written z = N

Cy, where y ∈ R

k+1. As in the case with the z-variables, y is thevector containing all variables whereas y is a vector containing all but the last variable.Assuming that the linear constraints are feasible we may always choose that basis suchthat yk+1 = zn+1 = 1. We put L

C= NT

CLN

C, M

C= NT

CMN

C. In the new space

we get the following formulation

infy

yT LC

y

yT MC

y

s.t. y2k+1 − 1 = 0

yT NT

CN

Cy = ||y||2N

C= n + 1, (11)

we will use f(y) to denote the objective function of this problem. A common approachto solving this kind of problem is to simply drop one of the two constraints. This mayhowever result in very poor solutions. We shall see that we can in fact solve this problemexactly without excluding any constraints.

3 Lagrangian Relaxation and Strong Duality

In this section we will show how to solve (6) using Lagrange duality. To do this we startby generalizing a lemma from [7] for trust region problems

Lemma 1. If there exists a y with yT A3y + 2bT3 y + c3 < 0, then, assuming the existence

of a minima, the primal problem

infy

yT A1y + 2bT1 y + c1


, s.t yT A3y + 2bT3 y + c3 ≤ 0 (12)

and the dual problem

supλ≥0

infy

yT (A1 + λA3)y + (b1 + λb3)T y + c1 + λc3


(13)

has no duality gap.

Proof. The primal problem can be written as

inf γ1

s.t yT (A1 − γ1A2)y + 2(b1 − γ1b2)T y + c1 − γ1c2 ≤ 0

yT A3y + 2bT3 y + c3 ≤ 0

(14)

129

PAPER V

Let M(λ, γ) be the matrix

M(λ, γ) =[

A1+λA3−γA2 b1+λb3−γb2

(b1+λb3−γb2)T c1+λc3−γc2

]

(15)

The dual problem can be written

supλ≥0 infγ2,y γ2

s.t

[y

1

]T

M(λ, γ2)

[y

1

]

≤ 0(16)

Since (16) is dual to (14) we have that for their optimal values, γ∗2 ≤ γ∗

1 must hold. Toprove that there is no duality gap we must show that γ∗

2 = γ∗1 . We do this by considering

the following problemsupγ3,λ≥0 γ3

s.t M(λ, γ3) � 0(17)

Here M(λ, γ3) � 0 means that M(λ, γ3) is positive semidefinite. We note that ifM(λ, γ3) � 0 then there is no y fulfilling

[y

1

]T

M(λ, γ3)

[y

1

]

+ ǫ ≤ 0 (18)

for any ǫ > 0. Therefore we must have that the optimal values fulfills γ∗3 ≤ γ∗

2 ≤ γ∗1 . To

complete the proof we show that γ∗3 = γ∗

1 . We note that for any γ ≤ γ∗1 we have that

yT A3y + 2bT3 y + c3 ≤ 0 ⇒

yT (A1 − γA2)y + 2(b1 − γb2)T y + c1 − γc2 ≥ 0

(19)

However according to the S-procedure [1] this is true if and only if there exists λ ≥ 0 suchthat M(λ, γ) � 0. Therefore (γ, λ) is feasible for problem (17) and thus γ3 = γ1.

We note that for a fixed γ the problem

infy yT (A1 − γA2)y + 2(b1 − γb2)T y + c1 − γc2

s.t. yT A3y + 2bT3 y + c3 ≤ 0

(20)

only has an interior solution if A1−γA2 is positive semidefinite. If A3 is positive definitethen we may subtract k(yT A3y + 2bT

3 y + c3) (k > 0) from the objective function toobtain boundary solutions. This gives us the following corollary

Corollary 1. Let A3 be positive definite. If there exists a y with yT A3y + 2bT3 y + c3 < 0,

then the primal problem

infy



, s.t. yT A3y + 2bT3 y + c3 = 0 (21)

130

3. LAGRANGIAN RELAXATION AND STRONG DUALITY

and the dual problem

supλ

infy

yT (A1 + λA3)y + (b1 + λb3)T y + c1 + λc3


(22)

has no duality gap, (once again assuming that a minima exists for the primal problem).

Next we will show how to solve a problem on a form related to (11). Let

A1 =[

A1 b1bT1 c1

]

, A2 =[

A2 b2bT2 c2

]

, A3 =[

A3 b3bT3 c3

]

Theorem 3.1. Assuming the existence of a minima, if A3 is positive definite, then the primalproblem

infyT A3y+2bT

3y+c3=n+1



=

= infyT A3y=n+1

y2n+1=1

yT A1y

yT A2y(23)

and its dual

supt

infyT A3y=n+1

yT A1y + ty2n+1 − t

yT A2y(24)

has no duality gap.

Proof. Let γ∗ be the optimal value of problem (11). Then

γ∗ = infyT A3y=n+1

y2n+1=1

yT A1y

yT A2y

= supt infyT A3y=n+1

y2n+1=1

yT A1y+ty2n+1−t

yT A2y

≥ supt infyT A3y=n+1

yT A1y+ty2n+1−t

yT A2y

≥ supt,λ inf yyT A1y+ty2

n+1−t+λ(yT A3y−(n+1))

yT A2y

= sups,λ inf y

yT A1y+sy2n+1−s+λ(yT A3y+yn+12bT

3 y+c3−(n+1))

yT A2y=

= supλ infy2n+1

=1yT A1y+λ(yT A3y+2bT

3 y+c3−(n+1))

yT A2y

= supλ infyyT A1y+2bT

1 y+c1+λ(yT A3y+2bT3 y+c3−(n+1))

yT A2y+2bT2

y+c2

= γ∗ (25)

131

PAPER V

Where we let s = t+c3λ. In the last two equalities corollary 1 was used twice. The thirdrow of the above proof gives us that

µ∗ = supt

infyT A3y=n+1

yT A1y + ty2n+1 − t

yT A2y=

= supt

infyT A3y=n+1

yT A1y + ty2n+1 − t yT A3y

n+1

yT A2y=

= supt

infyT A3y=n+1

yT(

A1 + t(

[ 0 00 1 ] − A3

n+1

))

y

yT A2y(26)

Finally, since strong duality holds, we can state the following corollary. [1].

Corollary 2. If t∗ and y∗ solves (26), then (y∗)T N y∗ = n + 1 and y∗k+1 = 1. That is,

y∗ is an optimal feasible solution to (12)

4 The Dual Problem and Constrained Normalized Cuts

Returning to our relaxed problem (11) we start off by introducing the following lemma.

Lemma 2. L and M are both (n + 1) × (n + 1) positive semidefinite matrices of rank

n − 1, both their nullspaces are spanned by n1 = [ 1 ... 1 0 ]T

and n2 = [ 0 ... 0 1 ]T

.Consequently, L

Cand M

Care also positive semidefinite.

Proof. L is the zero-padded positive semidefinite Laplacian matrix of the affinity matrixW and is hence also positive semidefinite. For M it suffices to show that the matrix(1T d)D − ddT is p.s.d.

vT ((1T d)D − ddT )v =∑

i di

∑

j djv2j − (

∑

i divi)2

=∑

i,j didjvj(vj − vi) =∑

i didivi(vi − vi) +

+∑

i,j<i didjvj(vj − vi) + djdivi(vi − vj) =∑

i,j<i didj(vj − vi)2 ≥ 0, ∀v ∈ R

n (27)

The last inequality comes from di > 0 for all i which means that (1T d)D − ddT , andthus also M , are positive semidefinite.

The second statement follows since both Lni = Mni = 0 for i = 1, 2.

132

4. THE DUAL PROBLEM AND CONSTRAINED NORMALIZED CUTS

Next, since

vT Lv ≥ 0, ∀v ∈ Rn ⇒ vT Lv ≥ 0, ∀v ∈ Null(C) ⇒

⇒ wT NC

T LNC

T w ≥ 0, ∀w ∈ Rk ⇒

⇒ wT LCw ≥ 0, w ∈ R

k

it holds that LC� 0, and similarly for M

C.

Assuming that the original problem is feasible then we have that, as f(y) of problem(23) is the quotient of two positive semidefinite quadratic forms and is therefore f(y)non-negative, a minima for the relaxed Normalized Cut problem will exist. Theorem3.1 states that strong duality holds for a program on the form (23), if a minima exists.Consequently, we can apply the theory from the previous section directly and solve (11)through its dual formulation. Let

EC

= [ 0 00 1 ] −

NT

CN

C

n+1 = NT

C

[− I

n+10

0 1

]

NC

(28)

and let θ(y, t) denote the Lagrangian function. The dual problem is then

supt

inf||y||2

NC

=n+1θ(y, t) =

yT (LC

+ tEC

)y

yT MCy

(29)

The inner minimization is the well known generalized Rayleigh quotient, for whichthe minima is given by the algebraically smallest generalized eigenvalue1 of (L

C+ tE

C)

and MC

. Letting λGmin(t) and vG

min(t), denote the smallest generalized eigenvalue andcorresponding generalized eigenvector of (L

C+ tE

C) and M

Cwe can write problem

(29) as we can write problem (29) as

supt

λGmin(LC

+ tEC

, MC

). (30)

It can easily be shown that the minimizer of the inner problem of (29), is given by a scalingof the generalized eigenvector, y(t) = (||vG

min(t)||NC)vG

min(t). The relaxed NormalizedCut problem can thus be solved by finding the maxima of (30). As the objective functionis the point-wise infimum of functions linear in t, it is a concave function, as is expectedfrom dual problems. So solving (30) means maximizing a concave function in one variablet, this can be carried out using standard methods for one-dimensional optimization.

Unfortunately, the task of solving large scale generalized eigenvalue problems can bedemanding, especially when the matrices involved are dense, as the case is here. This canhowever be remedied, by exploiting the unique matrix structure we can rewrite the gener-alized eigenvalue problem as a standard one. First we note that the generalized eigenvalue

1By generalized eigenvalue of two matrices A and B we mean finding a λ = λG(A, B) and v, ||v|| = 1such that Av = λBv has a solution.

133

PAPER V

problem Av = λBv is equivalent to the standard eigenvalue problem B−1Av = λv, ifB is non-singular. Furthermore, in large scale applications it is reasonable to assume thatthe number of variables n + 1 is much greater than the number of constraints m. Thenthe base for the null space of the homogenized linear constraints N

Ccan then be written

on the form NC

= [ c c0

I ]. Now we can write

MC

= [ c c0

I ]T

([

((1T d)D−ddT ) 00 0

]

) [ c c0

I ] =

=

{

D:=h

D1 00 D2

i

d:=h

d1

d2

i

}

=[

D2 0

0 cT0 D1c0+1

]

︸︷︷︸

D

+

+[

cT cd1+d2 0

cT0 cT

0 d1 1

]

︸︷︷︸

V

[D1

1−1

]

︸︷︷︸

S

[ c c0

dT1 cT +dT

2 dT1 c0

0 1

]

=

= D + V SV T (31)

Hence, MC

is the sum of a positive definite, diagonal matrix D and a low-rankcorrection V SV T . As a direct result of the Woodbury matrix identity [5] we can expressthe inverse of M

Cas

MC−1 = (D + V SV T )−1 =

= D−1(

I − V (S−1 + V T D−1V )−1V D−1)

(32)

Despite the potentially immense size of the entering matrices, this inverse can be effi-ciently computed since D is diagonal and the size of the square matrices S and (S−1 +V T D−1V ) are both typically manageable and therefore easily inverted. Our general-ized eigenvalue problem then turns into the problem of finding the smallest algebraiceigenvalue of the matrix M

C−1L

C. The dual problem becomes

supt

λmin

((D−1(I − V (S−1 + V T D−1V )−1V D−1)

NC

T (LC

+ tEC

)NC

). (33)

Not only does this reformulation provide us with the more familiar, standard eigenvalueproblem but it will also allow for very efficient computations of multiplications of vectorsto this matrix. This is a crucial property, since, even though M

C−1(L

C+ tE

C) is still

dense, it is the product and sum of diagonal (D−1, EC

), sparse (LC

, NC

) and lowrank matrices (V , S−1). It is a very structured matrix to which iterative eigensolvers cansuccessfully be applied.

In certain case it might however occur that the quadratic form in the denominator isonly positive semidefinite and thus singular.

134

4. THE DUAL PROBLEM AND CONSTRAINED NORMALIZED CUTS

These cases are easily detected and must be treated specially. As we then can not invertM

Cand rewrite the problem as a standard eigenvalue problem we must instead work with

generalized eigenvalues, as defined in (30). This is preferably avoided as this is typically amore computationally demanding formulation, especially since the entering matrices aredense. Iterative methods for finding generalized methods for structured matrices such asL

C+ tE and M

C, do however exist [10]. Note, that the absence of linear constraints is

such a special instance. However, in that case homogenization is completely unnecessary,and (6) with Cz = b removed, is an standard unconstrained generalized Rayleigh quo-tient and the solution is given by the generalized eigenvalue λT

G(D−W, (1T d)D−ddT ).Now, if t∗ and y∗ = (||vG

min(t∗)||NC)vG

min(t∗) are the optimizers of (29), and (29),

corollary 2 certifies that (y∗)T NT

CN

Cy∗ = n + 1 and that y∗

k+1 = 1. With z∗ =[

z∗

z∗n+1

]

= NCy∗ and zn+1 = yn+1, we have that z∗ prior to rounding is the minimizer

of (6). Thus we have shown how to, through Lagrangian relaxation, solve the relaxed,linearly constrained Normalized Cut problem exactly.

Finally, the solution to the relaxed problem must be discretized in order to obtain asolution to the original binary problem (5). This is typically carried out by applying somerounding scheme to the solution.

4.1 Multi-Class Constrained Normalized Cuts

Multi-class Normalized Cuts is a generalization of (2) for an arbitrary number of parti-tions.

Nkcut =

k∑

l=1

cut(Al, V )

assoc(Al, V )(34)

If one minimizes (34) in an iterative fashion, by, given the current k-way partition, findinga new partition while keeping all but two partitions fixed. This procedure is known asthe α− β-swap when used in graph cuts applications, [2]. The associated subproblem ateach iteration then becomes

Nkcut =

cut(Ai, V )

assoc(Ai, V )+

+cut(Aj , V )

assoc(Aj , V )+

∑

l 6=i,j

cut(Al, V )

assoc(Al, V )=

cut(Ai, V )

assoc(Ai, V )+

cut(Aj , V )

assoc(Aj , V )+ c, (35)

where pixels not labeled i or j are fixed. Consequently, minimizing the multi-class sub-problem can be treated similarly to the bipartition problem. At each iteration we have a

135

PAPER V

problem on the form

infz

f(z) = zT (D−W )z−zT ddT z+(1T d)2

s.t. z ∈ {−1, 1}n

Cz = b, (36)

where W, D, C and b will be dependent on the current partition and choice of labels tobe kept fixed. These matrices are obtained by removing rows and columns correspondingto pixels not labeled i or j, the linear constraints must also be similarly altered to onlyinvolve pixels not currently fixed. Given an initial partition, randomly or otherwise,iterating over the possible choices until convergence ensures a multi-class segmentationthat fulfills all constraints. There is however no guarantee that this method will avoidgetting trapped in local minima and producing a sub-optimal solution, but during theexperimental validation this procedure always produced satisfactory results.

5 Experimental Validation

A number of experiments were conducted to evaluate our proposed formulation but alsoto illustrate how relevant visual information can be incorporated into the segmentationprocess through non-homogenous, linear constraints and how this can influence the par-titioning.

All images were gray-scale of approximately 100-by-100 pixels in size. The affin-ity matrix was calculated based on edge information, as described in [6]. The one-dimensional maximization over t was carried out using a golden section search, typicallyrequiring 15 − 20 eigenvalue calculations. The relaxed solution z was discretized bysimply thresholding at 0.

Firstly, we compared our approach with the standard Normalized Cut method, fig. 1.Both approaches produce similar results, suggesting that in the absence of constraints the

Figure 1: Original image (left), standard Normalized Cut algorithm (middle) and thereformulated Normalized Cut algorithm with no constraints (right).

136

6. CONCLUSIONS

two formulations are equivalent. However, where our approach has the added advantageof being able to handle linear constraints.

The simplest such constraint might be the hard coding of some pixels, i.e. pixel ishould belong to a certain class. This can be expressed as the linear constraints zi = ±1,i = 1..m. In fig. 2 it can be seen how a number of such hard constraints influences thesegmentation of the image in fig. 1.

Figure 2: Original image (left), segmentation with constraints (middle) and constraintsapplied (right).

Another visually significant prior is the size or area of the resulting segments, that isconstraints such as

∑

i zi = 1T z = a. The impact of enforcing limitations on the size ofthe partitions is shown in fig. 3.

Excluding and including constraints such as, pixel i and j should belong to the sameor separate partitions, zi + zj = 0 or zi − zj = 0, is yet another meaningful constraint.The result of including a combination of all the above types of constraints can be seen infig. 4.

Finally, we also performed a multi-class segmentation with linear constraints, fig. 5.

We argue that these results, not only indicate a satisfactory performance of the sug-gested method, but also illustrates the relevance of linear grouping constraints in imagesegmentation and the impact that they can have on the resulting partitioning. Theseexperiments also seem to indicate that even a simple rounding scheme as the one usedhere can often suffice. As we threshold at zero, hard, including and excluding constraintsare all ensured to hold after discretizing. Only the area constraints are not guaranteedto hold, however probably since the relaxed solution has the correct area, thresholding ittypically produces a discrete solution with roughly the correct area.

6 Conclusions

We have presented a reformulation of the classical Normalized Cut problem that allowsfor the inclusion of linear grouping constraints into the segmentation procedure, througha Lagrangian dual formulation. A method for how to efficiently find such a cut, even

137

PAPER V

Figure 3: Original image (top left), segmentation without constraints (top middle) andsegmentation boundary and constraints applied (top right). Segmentation with area con-straints, (area=100 pixels) (middle left), segmentation boundary and constraints applied(middle right). Segmentation with area constraints, (area=2000 pixels) (bottom left),segmentation boundary and constraints applied (bottom right).

for very large scale problems, has also been offered. A number of experiments as well astheoretical proof were also supplied in support of these claims.

Improvements to the presented method include, firstly, the one-dimensional searchover t. As the dual function is the point-wise infimum of the eigenvalues of a matrix, it issub-differentiable and utilizing this information should greatly reduce the time requiredfor finding t∗. Another issue that was left open in this work is regarding the roundingscheme. The relaxed solution z is currently discretized by simple thresholding at 0. Eventhough we can guarantee that z prior to rounding fulfills the linear constraints, this is notnecessarily true after thresholding and should be addressed. For simpler constraints, as theones used here, rounding schemes that ensures that the linear constraints hold can easilybe devised. We felt that an in-depth discussion on different procedures for discretizationwas outside the scope of this paper.

Finally, the question of properly initializing the multi-class partitioning should alsobe investigated as it turns out that this choice can affect both the convergence and thefinal result.

138

6. CONCLUSIONS

Figure 4: Original image (top left), segmentation without constraints (top middle), seg-mentation boundary and constraints applied (top right). Segmentation with hard, includ-ing and excluding, as well as area constraints, (area=25% of the entire image) (middle left),segmentation boundary and constraints applied (middle right). Segmentation with con-straints, (area=250 pixels) (bottom left), segmentation boundary and constraints applied(bottom right). Here a solid line between two pixels indicate an including constraint, anda dashed line an excluding.

Acknowledgments

This work has been supported by the European Commission’s Sixth Framework Pro-gramme under grant no. 011838 as part of the Integrated Project SMErobotTM , SwedishFoundation for Strategic Research (SSF) through the programme Vision in Cognitive Sys-tems II (VISCOS II), Swedish Research Council through grants no. 2004-4579 ’Image-Based Localisation and Recognition of Scenes’ and no. 2005-3230 ’Geometry of multi-camera systems’.

139

PAPER V

Figure 5: Original image (top left), three-class segmentation without constraints (topmiddle), segmentation boundary (top right). Three-class segmentation with hard, in-cluding and excluding constraints (bottom left), segmentation boundary and constraintsapplied (bottom right).

140

Bibliography


[2] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization viagraph cuts. IEEE Trans. Pattern Analysis and Machine Intelligence, 23(11):1222–1239, 2001.

[3] Yuri Boykov and Marie-Pierre Jolly. Interactive graph cuts for optimal boundaryand region segmentation of objects in n-d images. In International Conference onComputer Vision, pages 05–112. Vancouver, Canada, 2001.

[4] Timothee Cour and Jianbo Shi. Solving markov random fields with spectral relax-ation. In Proceedings of the Eleventh International Conference on Artificial Intelligenceand Statistics, volume 11, 2007.

[5] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins Studies inMathematical Sciences, 1996.

[6] Jitendra Malik, Serge Belongie, Thomas K. Leung, and Jianbo Shi. Contour andtexture analysis for image segmentation. International Journal of Computer Vision,43(1):7–27, 2001.

[7] Franz Rendl and Henry Wolkowicz. A Semidefinite Framework for Trust RegionSubproblems with Applications to Large Scale Minimization. Technical ReportCORR 94-32, Departement of Combinatorics and Optimization, December 1994.

[8] C. Rother, V. Kolmogorov, and A. Blake. ”GrabCut”: interactive foreground extrac-tion using iterated graph cuts. In ACM Transactions on Graphics, pages 309–314,2004.


[10] D. C. Sorensen and C. Yang. Truncated QZ methods for large scale gener-alized eigenvalue problems. SIAM Journal on Matrix Analysis and Applications,19(4):1045–1073, 1998.

[11] S. Yu and J. Shi. Multiclass spectral clustering. In International Conference onComputer Vision. Nice, France, 2003.

141

[12] S. Yu and J. Shi. Segmentation given partial grouping constraints. IEEE Trans.Pattern Analysis and Machine Intelligence, 2(26):173–183, 2004.

142

global optimization and approximation algorithms in ... · 1.2 convex optimization in this section...

Documents