openof framework for sparse non-linear least …openof framework for sparse non-linear least squares...

8
OpenOF Framework for Sparse Non-linear Least Squares Optimization on a GPU Cornelius Wefelscheid and Olaf Hellwich Computer Vision and Remote Sensing, Berlin University of Technology, Sekr. MAR 6-5, Marchstrasse 23, D-10587, Berlin, Germany {cornelius.wefelscheid, olaf.hellwich}@tu-berlin.de Keywords: Least Squares Optimization, Bundle Adjustment, Levenberg Marquardt, GPU, Open Source. Abstract: In the area of computer vision and robotics non-linear optimization methods have become an important tool. For instance, all structure from motion approaches apply optimizations such as bundle adjustment (BA). Most often, the structure of the problem is sparse regarding the functional relations of parameters and measurements. The sparsity of the system has to be modeled within the optimization in order to achieve good performance. With OpenOF, a framework is presented, which enables developers to design sparse optimizations regarding parameters and measurements and utilize the parallel power of a GPU. We demonstrate the universality of our framework using BA as example. The performance and accuracy is compared to published implementations for synthetic and real world data. 1 INTRODUCTION Non-linear least squares optimization is used in many research fields where measurements with an under- lying model are present. Most of the problems con- sist of hundreds of measurements and are represented by hundreds of parameters that need to be estimated. Minimizing the L2-norm between the actual measure- ment and the estimated value is established by e.g. solving a least squares system. Usually, as the un- derlying model is not linear, an iterative method for solving non-linear least squares needs to be applied. Such methods linearize the cost function in each it- eration. The Levenberg-Marquardt (LM) algorithm has more or less become a standard. It combines the Gauss-Newton algorithm with the gradient de- scent approach. In contrast to other approaches, LM guarantees convergence. The computationally most intensive part within LM is solving Ax = b in each iteration. Finding the solution of this normal equa- tion for a dense matrix has a complexity of O (n 3 ). This is not effective when dealing with 100k - 1M parameters on a standard computer. As most of the entries in A are zero, a sparse matrix representation is feasible. Different parametrization for sparse ma- trices have been used in the past. Most commonly a coordinate list (COO) is used, which stores for each entry row, column and value. A more compact stor- age is represented by the compressed sparse column (CSC) format. For each entry, the row and the value are stored in an array. Additionally, the starting in- dex of each column is saved separately, requiring less space than COO. The reduction of space between COO and CSC is neglectable, while the implemen- tation of efficient algorithms for matrix-vector multi- plications is important. Since the operations on ma- trices can be implemented in a highly parallel man- ner, we believe that multiprocessors, such as GPUs, are most suitable for this task. The operations within LM are as well highly parallel, as each cost function can be evaluated independently. To our knowledge no library has been developed which can solve gen- eral sparse non-linear least squares optimizations on a GPU. Two libraries address this problem in general, sparseLM (Lourakis, 2010) and g2o (Kummerle et al., 2011), both operating on a CPU. For solving the nor- mal equation, they implemented interfaces to various external sparse math libraries, which provide differ- ent algorithms for direct solving the normal equation. In most cases a Cholesky decomposition A = LDL T is applied. In contrast to sparseLM, our approach works on a GPU (presently only NVIDIA graphics cards are supported) and uses the high parallel power of the graphics card to solve the normal equation with a conjugate gradient (CG) approach, which can also be chosen as a solver in g2o. The potential of paral- lelizing CG is the main advantage over a Cholesky decomposition. Our framework is built upon three 260

Upload: others

Post on 25-Jun-2020

26 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OpenOF Framework for Sparse Non-linear Least …OpenOF Framework for Sparse Non-linear Least Squares Optimization on a GPU Cornelius Wefelscheid and Olaf Hellwich Computer Vision and

OpenOFFramework for Sparse Non-linear Least Squares Optimization on a GPU

Cornelius Wefelscheid and Olaf HellwichComputer Vision and Remote Sensing, Berlin University of Technology,

Sekr. MAR 6-5, Marchstrasse 23, D-10587, Berlin, Germanyfcornelius.wefelscheid, [email protected]

Keywords: Least Squares Optimization, Bundle Adjustment, Levenberg Marquardt, GPU, Open Source.

Abstract: In the area of computer vision and robotics non-linear optimization methods have become an important tool.For instance, all structure from motion approaches apply optimizations such as bundle adjustment (BA). Mostoften, the structure of the problem is sparse regarding the functional relations of parameters and measurements.The sparsity of the system has to be modeled within the optimization in order to achieve good performance.With OpenOF, a framework is presented, which enables developers to design sparse optimizations regardingparameters and measurements and utilize the parallel power of a GPU. We demonstrate the universality of ourframework using BA as example. The performance and accuracy is compared to published implementationsfor synthetic and real world data.

1 INTRODUCTION

Non-linear least squares optimization is used in manyresearch fields where measurements with an under-lying model are present. Most of the problems con-sist of hundreds of measurements and are representedby hundreds of parameters that need to be estimated.Minimizing the L2-norm between the actual measure-ment and the estimated value is established by e.g.solving a least squares system. Usually, as the un-derlying model is not linear, an iterative method forsolving non-linear least squares needs to be applied.Such methods linearize the cost function in each it-eration. The Levenberg-Marquardt (LM) algorithmhas more or less become a standard. It combinesthe Gauss-Newton algorithm with the gradient de-scent approach. In contrast to other approaches, LMguarantees convergence. The computationally mostintensive part within LM is solving Ax = b in eachiteration. Finding the solution of this normal equa-tion for a dense matrix has a complexity of O(n3).This is not effective when dealing with 100k - 1Mparameters on a standard computer. As most of theentries in A are zero, a sparse matrix representationis feasible. Different parametrization for sparse ma-trices have been used in the past. Most commonly acoordinate list (COO) is used, which stores for eachentry row, column and value. A more compact stor-age is represented by the compressed sparse column

(CSC) format. For each entry, the row and the valueare stored in an array. Additionally, the starting in-dex of each column is saved separately, requiring lessspace than COO. The reduction of space betweenCOO and CSC is neglectable, while the implemen-tation of efficient algorithms for matrix-vector multi-plications is important. Since the operations on ma-trices can be implemented in a highly parallel man-ner, we believe that multiprocessors, such as GPUs,are most suitable for this task. The operations withinLM are as well highly parallel, as each cost functioncan be evaluated independently. To our knowledgeno library has been developed which can solve gen-eral sparse non-linear least squares optimizations ona GPU. Two libraries address this problem in general,sparseLM (Lourakis, 2010) and g2o (Kummerle et al.,2011), both operating on a CPU. For solving the nor-mal equation, they implemented interfaces to variousexternal sparse math libraries, which provide differ-ent algorithms for direct solving the normal equation.In most cases a Cholesky decomposition A = LDLT

is applied. In contrast to sparseLM, our approachworks on a GPU (presently only NVIDIA graphicscards are supported) and uses the high parallel powerof the graphics card to solve the normal equation witha conjugate gradient (CG) approach, which can alsobe chosen as a solver in g2o. The potential of paral-lelizing CG is the main advantage over a Choleskydecomposition. Our framework is built upon three

260

Page 2: OpenOF Framework for Sparse Non-linear Least …OpenOF Framework for Sparse Non-linear Least Squares Optimization on a GPU Cornelius Wefelscheid and Olaf Hellwich Computer Vision and

major libraries, Thrust, CUSP and SymPy. Thrustenhances the productivity of developers by providinghigh level interfaces for massive parallel operationson multiprocessors. Thrust is integrated in the currentCUDA version 4.2 and supported by NVIDIA. Theframework presented in this paper can directly takeadvantage of any improvements. The library CUSPis based on Thrust and provides developers with rou-tines for sparse linear algebra. The library SymPy isan open source library for symbolic mathematics inPython. The aim of our work is to give scientists anew tool to test new models in less amount of time.Using SymPy in our framework enables us to defineleast squares models within a high level scripting lan-guage. Starting from this, our framework automati-cally generates C++ code which is compiled, gener-ating the OpenOF optimization library, which can ei-ther be called from C++ programmes for efficient use,or from Python for rapid prototyping. OpenOF willbe published as open source library under the GNUGPL v3 license.

2 RELATED WORK

Recently, non-linear optimization has gathered a lotof attention. It has been successfully applied in dif-ferent areas, e.g. Simultaneous Localization and Map-ping (SLAM) or BA. We will use BA for evaluatingour framework, as it is a perfect example for a rathercomplex but sparse system. Several libraries can beused to solve BA. The SBA library (Lourakis and Ar-gyros, 2009) which is used by several research groupstakes advantage of the special structure of the Hessianmatrix to apply the Schur complement for solving thelinear system. Nevertheless it has several drawbacks.Integrating additional parameters which remain iden-tical for all measurements (e.g. camera calibration) isnot possible, as the structure would change such thatthe Schur complement could not be applied anymore.With OpenOF, we provide a solution for those prob-lems.

sparseLM (Lourakis, 2010) provides a softwarepackage which is supposed to be general enough tohandle the same kind of problems as OpenOF does.However, integrating new models within sparseLM istime consuming. The performance is slow for prob-lems with many parameters, as no multithreading isintegrated.

More recently another framework was developedfor similar purposes in graph optimization (Kum-merle et al., 2011). In g2o, the Jacobian is evaluatedby numerical differentiation which is time consumingand also degrades the convergence rate. Developers

can choose between direct (Davis, 2006) and itera-tive solver such as CG. Kummerle et al. (2011) claimsthat CG is most often slower than a direct solver. Inour experience direct solvers are more precise. How-ever utilizing the parallelism of multiprocessors, CGcan be significantly faster. Applying a preconditionerto CG has a large impact on the speed of conver-gence. g2o uses a block Jacobi preconditioner. Forsimplicity we use a diagonal preconditioner which hasbeen applied successfully for various tasks. A bigadvantage of the diagonal preconditioner is, that theleast squares CG is computed directly from the Ja-cobian without explicitly calculating the Hessian ma-trix. Several other approaches (Kaess et al., 2011),which address only a subset of problems, have beenpresented previously for least squares optimization.But so far no library provides the speed, the ease ofuse and the generality to become a milestone withinthe communities of computer vision or robotics.

3 LEVENBERG-MARQUARDT

LM is regarded as standard for solving non-linearleast squares optimization. It is an iterative approachwhich was first developed by Levenberg in 1944 andrediscovered by Marquardt in 1963. The algorithmdetermines the local minimum of the sum of leastsquares of a multivariant function. An overview of thefamily of non-linear least squares optimization tech-niques can be found in (Madsen et al., 2004). LMcombines a Gauss-Newton method with a steepest de-scent approach. Given a parameter vector x we wantto find a vector xmin that minimizes the function g(x)which is defined as the sum of squares of the vectorfunction f (x).

xmin = argminx

g(x) = argminx

12

m

åi=0

fi(x)2 (1)

As the function is usually not convex, a suitable initialsolution x0 needs to be provided. Otherwise, LM willnot converge to the global optimum, but get stuck ina local minimum, which might be far away from theoptimal solution. In each iteration i, g(x) is approxi-mated by a second order Tailor series expansion.

g(x) = g(xi)+g(xi)0(x�xi)+

12

g(xi)00(x�xi)

2 (2)

To find the minimum the first derivative of Equation2 is set to zero, resulting in

¶g(x)¶x

= g(xi)0+g(xi)

00(x�xi) = 0 : (3)

The first order derivative at the point xi can be re-placed by the transposed Jacobian JT multiplied with

OpenOF�-�Framework�for�Sparse�Non-linear�Least�Squares�Optimization�on�a�GPU

261

Page 3: OpenOF Framework for Sparse Non-linear Least …OpenOF Framework for Sparse Non-linear Least Squares Optimization on a GPU Cornelius Wefelscheid and Olaf Hellwich Computer Vision and

the residual. The second order derivative is approxi-mated by the Hessian matrix given as JT J. The pa-rameter update h = x�xi in each iteration is acquiredby solving the normal equation.

JT Jh =�JT f (xi) (4)

To include the gradient descent approach, Equation 4is extended by a damping factor µ. For µ! 0 LMapplies the Gauss-Newton method and for µ! ¥ agradient descent step is executed.

(JT J+µI)h =�JT f (xi) (5)

If the update step does not result in a smaller error thedamping factor is increased pushing the parametersfurther towards the steepest descent. Else, the updateis performed and the damping factor is reduced. Incases with a known covariance of the measurements,Equation 5 is extended to

(JTS�1J+µI)h =�JT

S�1 f (xi) : (6)

Instead of the Euclidean norm the Mahalanobis dis-tance is minimized. Solving the normal equation isthe most demanding part of the algorithm. If the nor-mal matrix has a certain structure, approaches suchas the Schur complement can be applied. Generally,the normal equation can be solved with a Choleskydecomposition. Another alternative is an iterative ap-proach such as CG, which is incorporated in OpenOF.In the next section, the CG approach will be describedin more detail.

4 CONJUGATE GRADIENT

The CG method is a numerical algorithm for solvinga linear equation system Ax = b with A being sym-metric positive definite and x;b 2 Rn. As CG is aniterative algorithm, there exists a residual vector rk ineach step k defined as:

rk = Axk�b : (7)

CG belongs to the Krylov subspace methods. Krylovsubspace is described by

Kk(A;r0) = spanfr0;Ar0;A2r0; : : : ;Ak�1r0g : (8)

It is the basis for many iterative algorithms. Thereis proof that those methods terminate in at most nsteps. CG is a highly economical method which doesnot require much memory or time demanding matrix-matrix multiplication. While applying CG, a set ofconjugate vectors (p0;p1; ::;pk�1;pk) is computed be-longing to the Krylov subspace. In each step the al-gorithm computes a vector pk being conjugate to allprevious vectors. A great advantage is that only the

previous vector pk�1 is needed. Thus, the other vec-tors can be omitted, thereby saving memory. The al-gorithm is based on the principle that pk minimizesthe residual over one spanning direction in every it-eration. CG is initialized with r0 = Ax0 � b;p0 =�r0;k = 0. In each iteration a step length a is com-puted, which minimizes the residual over the currentspanning vector pk.

ak =�rT

k rk

pTk Apk

(9)

The update of the current approximate solution isgiven by

xk+1 = xk +akpk (10)

which results in a new residual

rk+1 = rk +akApk : (11)

The new conjugate direction is computed from Equa-tions 12 and 13.

bk+1 =�rT

k+1rk+1

rTk rk

(12)

pk+1 = rk+1 +bk+1pk (13)

Finally k is increased. This procedure is repeated aslong as jjrkjj > e and k < kmax. In most cases, onlya few iterations are necessary as the CG method isused within the iterative LM algorithm. For prove ofconvergence and full derivation we refer to (Nocedaland Wright, 1999).

For practical considerations the explicit computa-tion of the matrix A = JT J can be omitted. Concate-nating matrix-vector multiplications for JT and J isfaster and the numerical precision is higher. Such anapproach is known as least squares CG.

5 FRAMEWORK

OpenOF aims at making development cycles faster. Itsaves the time of the developer searching for errors byautomatic code generation, but not at the expense ofcomputational efficiency. As we use fast high levellibraries on a GPU, the generated executable gainsperformance. The design of the underlying modelassumes different types of objects containing groupsof parameters.These objects are linked by measure-ments. The objects can be combined in arbitrary ways(see Fig. 1). Usually, the computation of the Jacobimatrix is either done by hand, or with the help of asymbolic toolbox (SymPy, Matlab). Both cases areerror prone. The mistakes easily occure when ei-ther transferring the code into an optimization library

VISAPP�2013�-�International�Conference�on�Computer�Vision�Theory�and�Applications

262

Page 4: OpenOF Framework for Sparse Non-linear Least …OpenOF Framework for Sparse Non-linear Least Squares Optimization on a GPU Cornelius Wefelscheid and Olaf Hellwich Computer Vision and

Opt. Obj. Type 1 Opt. Obj. Type 3Opt. Obj. Type 2

Meas. Type 1 Meas. Type 2

Figure 1: Optimization objects interact with different mea-surement types.

such as sparseLM (Lourakis, 2010) or from generat-ing the derivatives by hand. This step needs to be doneeach time small changes are made in the underlyingmodel. As the cost function usually can be describedin various ways, exploring which modeling of e.g. re-lations between unknowns and measurements worksbest is time consuming. Additionally the convergencerate highly depends on the chosen model e.g. a con-vex cost function is most preferable. In OpenOF themodel can be adjusted without further low level pro-gramming steps, as the necessary code is generatedautomatically. The main optimization is designed ina modular way, enabling developers to easily mod-ify the optimization algorithm or add new algorithmswhile still having the advantage of the high perfor-mance of a GPU.

An optimization task often comprises thousandsof parameters. Usually several entities have the samemeaning, e.g. (x;y) refer to the coordinate of a point.Furthermore, these parameters can be grouped to ob-jects. Within OpenOF all parameters are assigned toan object. When designing optimization objects, wedistinguish between two different kinds of parame-ters:� Variable parameters

� Fixed parametersThe variable parameters will be adjusted within theoptimization. The fixed parameters remain constantbut are necessary to evaluate the cost function. Mostoften we end up with a small number of differenttypes of objects. These objects are linked by socalled measurement types. A measurement representsa function on which the least squares optimization isperformed. The user defines the function representingrelations between the different objects. Usually, notall types of objects are necessary to evaluate a certainmeasurement (see Figure 1). Sometimes we have a di-rect but noisy observation Xobs, e.g. the position givenfrom a GPS sensor. In this case we define a mea-surement f (Xest ;Xobs) = Xest �Xobs. It is clear thatthe observation Xobs is not allowed to change. Thisfunction can be modeled in two ways in OpenOF. Forthe first option two different object types are defined,where the first object type has only variable parame-ters and the second has only fixed parameters. In this

case there are two almost identical object types whichonly differ regarding which parameters are allowed tobe changed. It would be more intuitive to have onlyone object type which defines a position with vari-able parameters. To still be able to fix the parametersof the observation Xobs, it is possible in OpenOF toalso fix complete object types within a measurementfunction. Getting back to the example, one would de-fine for the function f , Xest to be variable and Xobsto be fixed. This enables researchers to incorporateconstraints such that certain parameters stay close tothe initial values without the need to generate a newtype of object with identical parameters. Being able todefine an object as constant in one measurement andvariable in another measurement gives much flexibil-ity for defining an optimization.

Each measurement forms an equation, which istransformed to a cost function. This cost functionis designed with SymPy. The Jacobian of the costfunction is computed automatically and transformedto C++ code. Different measurement types as well asobject types combined, result in a graph similar to theone presented in (Kummerle et al., 2011) (see Figure1).

By now only the structure of the optimization hasbeen defined. Next, the optimization needs to be filledwith actual data. Therefore, a list containing the dataof each type of object is created. A measurement isdefined by the type of measurement as well as the in-dex pointing to the corresponding data within the datalist.

6 EXAMPLES

We present two examples, first the classic BA andsecond an alternative BA parametrization inspiredby (Civera et al., 2008) using inverse distances.BA contains four kinds of optimization objects.The first object is a 3D point parametrized bya name and its coordinates (pt3:[X,Y,Z]). Thesecond object type defines the external cameraparameters (cam:[CX,CY,CZ,q1,q2,q3,q4]) withthe center point and a quaternion representing therotation of the camera. Since the same camerais repeatedly used within one reconstruction, anextra object for the internal camera parameters(cam in:[fx,fy,u0,v0,s,k1,k2,k3]) is defined.This enables us to constrain the optimization. Severalcameras then share the same internal calibration(focal length, principle point, skew and distortion pa-rameters). At last a constant object which contains thepixel measurements (pt2:[x,y]) is specified. Theseparameters remain unchanged within the optimization

OpenOF�-�Framework�for�Sparse�Non-linear�Least�Squares�Optimization�on�a�GPU

263

Page 5: OpenOF Framework for Sparse Non-linear Least …OpenOF Framework for Sparse Non-linear Least Squares Optimization on a GPU Cornelius Wefelscheid and Olaf Hellwich Computer Vision and

Figure 2: Inverse point parametrization with source cameraand destination camera.

but are generated as the same type of structure. Next,different measurement types are described. We createa measurement (quat:[cam],[0]) where the objectis a quaternion enforced to have unit length. Theindices written in the second pair of square bracketsspecify the indices of the objects to be optimized.The next measurement represents the error of thereprojection in the image with fixed intrinsic parame-ters (projMetric:[pt2,pt3,cam,cam in],[1,2]).Only the objects pt3 and cam are part of theoptimization. Adjusting the measurement in away that the intrinsic parameters are optimized aswell, the last list is extended by the fourth object(projRadial:[pt2,pt3,cam,cam in],[1,2,3]).The user provides the system with information,closely related to the bracket notation above. UsingOpenOF, it is easy and fast to test different kinds ofparametrizations.

We compare a new BA parametrization with theone previously described. This new method is calledinverse bundle adjustment (IBA) due to the inversedepth parametrization (Civera et al., 2008). A 3Dpoint is no longer described by three parameters, butwith a source camera, a unit vector and an inversedepth. So the point is linked to the camera of firstappearance (see Fig. 2). For the parametrization weend up with the following list:

� pt3inv:[d][nx,ny,nz]

� cam:[CX,CY,CZ,q1,q2,q3,q4]

� cam in:[fx,fy,u0,v0,s,k1,k2,k3]

� pt2:[x,y]

In this example we distinguish between fix and opti-mizable parameters. For pt3inv, d is part of the opti-mization, but nx,ny,nz remain constant. The coordi-nates (X ;Y;Z) of a 3D point are computed accordingto

X =

0B@ 0XYZ

1CA=1d

q�1nq+ C (14)

with q being the quaternion corresponding to thesource camera and C is the camera center. The tildeabove the variable denotes a quaternion representa-tion of a 3D vector. The described parametrizationhas several advantages. The degrees of freedom of thecomplete optimization are reduced by 2n with n rep-resenting the number of points. The part of the statevector which holds the parameters of the cameras hasthe same size, since a camera can be a source and adestination camera at the same time. At the same timethe amount of measurements is reduced by n. Themeasurement in the source camera is neglected. Thismight be a disadvantage as more measurements makethe optimization more stable. Regarding the opti-mization, more important is the ratio between param-eters and measurements. Assuming that each point isseen by each camera and n > 7;m > 3, with m be-ing the number of cameras, the ratio of parameters tomeasurements is always smaller for IBA than for BA.

7m+n2n(m�1)

<7m+3n

2nm(15)

For Equation 15 we assumed that the intrinsiccamera parameters are fixed. We investigated adepth parametrization as well but found the inverseparametrization to be superior due to a more linearbehavior.

7 RESULTS

For evaluation we compare IBA to BA both computedwith OpenOF. Both methods are evaluated againstsimulated data as well as a real world dataset. Theperformance is compared against two open source BAimplementations, Simple Sparse Bundle Adjustment(SSBA) (Zach, 2012) and PBA (Wu et al., 2011). Thelatter stands for Parallel Bundle Adjustment imple-mented for the GPU and CPU. In contrast to SSBA,PBA uses CG as presented in our approach. We showthat the performance of the specially optimized PBAis comparable to OpenOF. All evaluations are per-formed on an Intel i7 with 3.07 GHz, 24 GB RAMand an NVIDIA GTX 570 graphics card with 2.5GB memory. For both versions of PBA as well asfor OpenOF we use single precision. In contrastSSBA runs with double precision. For evaluation weuse synthetic data shown in Fig. 3. The 3D pointsare randomly generated and placed on the walls of acube. The points are only projected into cameras for

VISAPP�2013�-�International�Conference�on�Computer�Vision�Theory�and�Applications

264

Page 6: OpenOF Framework for Sparse Non-linear Least …OpenOF Framework for Sparse Non-linear Least Squares Optimization on a GPU Cornelius Wefelscheid and Olaf Hellwich Computer Vision and

105

0510

10 5 0 5 10

10

5

0

5

10105

0510

10 5 0 5 10

10

5

0

5

10

Figure 3: Setup of points (blue) and cams (red) for the sim-ulated data.

which the points are directly visible in reality, assum-ing the cube is not transparent. The cube has a sizeof 10x10x10. The cameras are positioned on a cir-cle with a radius of 8, facing the center of the cube.The points are projected into a virtual camera withan image size of 640x480 and a focal length of 1000measured in pixels. The principle point is assumed tolie in the center of the image. To simulate structurefrom motion approaches, where the initial camera po-sition is rather a rough estimate, we apply noise to thecamera position. Normally distributed noise of 0.5pixels is applied to the measurement. Furthermore,we triangulate new points from the virtual projectionsinstead of using the original 3D point, resulting ina more realistic setup. With known positions of thecameras, we investigate the convergence of the fiveBA implementations stated earlier by continuouslyincreasing the noise of the camera position. Withinthis simulation we use 100 cameras and 1000 points.The overall estimate of each method is transformedto the original cameras with a Helmert transforma-tion. The mean distance error between the estimatedand true camera position is shown in Fig. 4(a) fordifferent amounts of noise. For each noise level themean of several iterations is plotted. Within our testthe GPU implementation of PBA did not converge inmost cases, which is shown in Fig. 4(a). The rea-son for the failure could not be identified but we canexclude a wrong usage as there is only a single lineof code which toggles PBA from a CPU version toa GPU one. Every other method shows similar preci-sion for s< 0:6. For s> 0:6, IBA shows a better con-vergence than the other methods due to less degreesof freedom. Especially the CPU version of PBA mostoften did not converge for noise s > 1:2. The con-vergence problem of PBA causes also an increasingtime (Fig. 4(b)), while the other approaches stay con-

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8noise sigma

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

dis

tance

err

or

OpenOF (gpu)OpenOF (gpu, inv.)SSBA (cpu)PBA (cpu)PBA (gpu)

(a) Precision

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8noise sigma

0.0

0.5

1.0

1.5

2.0

2.5

3.0

seco

nds

OpenOF (gpu)OpenOF (gpu, inv.)SSBA (cpu)PBA (cpu)PBA (gpu)

(b) Performance

Figure 4: Applying normal distributed noise on the camerapositions (100 cameras).

stant in time. To evaluate the speed of the algorithms,we run each method with a constant amount of noisewhile concomitantly increasing the number of cam-eras. This reduces the distance between two camerasand increases the number of projections of one 3Dpoint. As expected, SSBA, using a direct solver im-plemented on a CPU, shows the lowest performance.Regarding this example, the CG approach (OpenOFand PBA) operates much faster than a direct solver(see Figs. 5(a) and 5(b)). The GPU version of PBAcannot be taken into account because it did not con-verge.

The real world data evaluation is performed ona dataset with 390 images taken with a Canon EOS7D camera (see Fig. 6(b)). The pictures were takenin the American Museum of Natural History withinthe scope of a project aiming at the reconstructionof mammoths. Fig. 6(a) shows the final reconstruc-tion computed with PMVS2 (Furukawa and Ponce,2010). The path computed with OpenOF is illustratedin Fig. 6(c). We compare the convergence rates be-tween SSBA and OpenOF, respectively. PBA is not

OpenOF�-�Framework�for�Sparse�Non-linear�Least�Squares�Optimization�on�a�GPU

265

Page 7: OpenOF Framework for Sparse Non-linear Least …OpenOF Framework for Sparse Non-linear Least Squares Optimization on a GPU Cornelius Wefelscheid and Olaf Hellwich Computer Vision and

100 200 300 400 500 600 700 800 900nr cams

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

dis

tance

err

or

OpenOF (gpu)OpenOF (gpu, inv.)SSBA (cpu)PBA (cpu)PBA (gpu)

(a) Precision

100 200 300 400 500 600 700 800 900nr cams

0

5

10

15

20

seco

nds

OpenOF (gpu)OpenOF (gpu, inv.)SSBA (cpu)PBA (cpu)PBA (gpu)

(b) Performance

Figure 5: Changing the number of cameras with constantamount of noise on the camera position (s = 0:5).

included in this part of the evaluation. It has an in-ternally different error metric for the residuals so thatSSBA and PBA can not be compared at the same time.As shown in Fig. 7(a), the convergence of SSBA andOpenOF is similar for BA with respect to the num-ber of iterations. Regarding the number of iterationsrelative to the overall time, the CG approach outper-forms the direct solver. This success is even larger fordatasets where the amount of overlapping images ishigher, as shown in the synthetic evaluation. The con-vergence rate of IBA is better than of BA. This mightnot hold in any case, but developers are encouraged totry out different parametrizations to determine the op-timal solution. Additionally a different parametriza-tion for the rotation is feasible, e.g. Euler angles.

8 CONCLUSIONS

We present OpenOF, an open source framework forsparse non-linear least squares optimization. The costfunctions to be optimized are described in a high

(a) Quasi-dense point cloud

(b) Sample image

(c) Camera path

Figure 6: Three mammoths acquired at the American Mu-seum of Natural History.

level scripting language. Reducing the implementa-tion time enables developers to comfortably try dif-ferent modelings of functions. The estimates areoptimized on a GPU considering the sparse struc-ture of the model. We showed that the generalityof OpenOF is not accomplished at the expense ofspeed. OpenOF can compete with implementationsthat are specifically designed for certain applications.

VISAPP�2013�-�International�Conference�on�Computer�Vision�Theory�and�Applications

266

Page 8: OpenOF Framework for Sparse Non-linear Least …OpenOF Framework for Sparse Non-linear Least Squares Optimization on a GPU Cornelius Wefelscheid and Olaf Hellwich Computer Vision and

0 10 20 30 40 50iterations

0

2

4

6

8

10

12

resi

dual

OpenOF (gpu)OpenOF (gpu, inv.)SSBA (cpu)

(a) Iterations

0 1 2 3 4 5 6seconds

0

2

4

6

8

10

12

resi

dual

OpenOF (gpu)OpenOF (gpu, inv.)SSBA (cpu)

(b) Time

Figure 7: Convergence of the mammoth dataset.

OpenOF works successful on BA. A fair comparisonto the GPU version of PBA was not possible sincePBA did not provide reliable results. We will furtherinvestigate why PBA shows just a poor convergencebehavior on our data. With IBA a new parametriza-tion for BA has been introduced which often convergefor examples in which BA gets stuck in a local min-imum. As in IBA the reprojection error of a point inthe source camera is zero, IBA probably does not findthe overall best solution. In practice we experiencedthat IBA still converges for multiple datasets whereBA found a poor solution. In this case the result ofIBA can be further optimized with an additional BAstep.

The main aim of OpenOF is not to be anotherBA implementation but to solve much more complexleast squares optimizations problems. Recently a fea-ture has been added to OpenOF which allows the costfunction to include numerical parts. This is useful forminimizing pixel differences in multiview stereo ap-proaches. A set of various robust cost functions suchas Huber or a truncated L2 norm is available as well.

For future releases we want to integrate a CPU ver-sion especially for developers which does not needthe speed, but want to have the flexibility in their de-sign of cost functions.

REFERENCES

Civera, J., a.J. Davison, and Montiel, J. (2008). InverseDepth Parametrization for Monocular SLAM. IEEETransactions on Robotics, 24(5):932–945.

Davis, T. A. (2006). Direct Methods for Sparse LinearSystems, volume 75 of Fundamentals of Algorithms.SIAM.

Furukawa, Y. and Ponce, J. (2010). Accurate, dense, and ro-bust multiview stereopsis. IEEE transactions on pat-tern analysis and machine intelligence, 32(8):1362–76.

Kaess, M., Johannsson, H., Roberts, R., Ila, V., Leonard, J.,and Dellaert, F. (2011). iSAM2: Incremental smooth-ing and mapping with fluid relinearization and incre-mental variable reordering. In Robotics and Automa-tion (ICRA), 2011 IEEE International Conference on,pages 3281–3288. IEEE.

Kummerle, R., Grisetti, G., Strasdat, H., Konolige, K.,and Burgard, W. (2011). g2o: A general frameworkfor graph optimization. In Robotics and Automa-tion (ICRA), 2011 IEEE International Conference on,pages 3607–3613. IEEE.

Lourakis, M. (2010). Sparse non-linear least squares op-timization for geometric vision. Computer Vision -ECCV 2010, pages 43–56.

Lourakis, M. I. A. and Argyros, A. A. (2009). SBA: Asoftware package for generic sparse bundle adjust-ment. ACM Transactions on Mathematical Software,36(1):1–30.

Madsen, K., Nielsen, H. B., and Tingleff, O. (2004). Meth-ods for Non-Linear Least Squares Problems (2nd ed.).

Nocedal, J. and Wright, S. J. (1999). Numerical Optimiza-tion, volume 43 of Springer Series in Operations Re-search. Springer.

Wu, C., Agarwal, S., Curless, B., and Seitz, S. (2011). Mul-ticore bundle adjustment. In Computer Vision and Pat-tern Recognition (CVPR), 2011 IEEE Conference on,number x, pages 3057–3064. IEEE.

Zach, C. (2012). Sources. http://www.inf.ethz.ch/personal/chzach/ opensource.

OpenOF�-�Framework�for�Sparse�Non-linear�Least�Squares�Optimization�on�a�GPU

267