parallel line-based structure from motion by using ... · full paper parallel line-based structure...

FULL PAPER

Parallel line-based structure from motion by using omnidirectional camera in textureless scene

Ryosuke Kawanishi, Atsushi Yamashita*, Toru Kaneko and Hajime Asama

Department of Precision Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan

(Received 22 March 2012; accepted 18 September 2012)

In this paper, we propose a reconstruction method for a 3D structure using sequential omnidirectional images in an artifi-cial environment. The proposed method is fundamentally categorized into the Structure from Motion (SfM) technique.The conventional point-based SfM using a standard camera is, however, likely to fail to recover a 3D structure in an arti-ficial and textureless environment such as a corridor. To tackle this problem, the proposed technique uses an omnidirec-tional camera and line-based SfM. Line features, such as a borderline of a wall and a floor or a window frame, are easyto discern in an artificial environment comparing point features, even in a textureless scene. In addition, an omnidirec-tional camera can track features for a long period because of its wide field-of-view. Extracted line features in an artificialenvironment are often mutually parallel. Parallel lines provide valuable constraints for camera movement estimation.Directions and locations of lines are estimated simultaneously with 3D camera movements. A 3D model of the environ-ment is constructed from measurement results of lines and edge points. Experimental results show the effectiveness ofour proposed method.

Keywords: structure from motion; parallel lines; textureless scene; omnidirectional camera

1. Introduction

In this paper, we propose a reconstruction method of a3D structure using sequential images acquired with amonocular omnidirectional camera in an artificialenvironment.

The accurate and efficient estimation of the cameramovement is very important for scene reconstructionusing the monocular stereo method. As a method ofmonocular stereo, approaches based on Structure fromMotion (SfM) [1,2] or vSLAM [3,4] have been pro-posed. Camera movement estimation from an imagesequence acquired using a single camera is difficultbecause a camera movement matrix (such as the essentialmatrix) has at least six degrees of freedom (DOFs) (threeDOFs for rotation and three DOFs for translation). More-over, the estimation is a nonlinear problem. When loca-tions and orientations of many viewpoints are estimatedsimultaneously using previous methods, the processingcost is high and the probability of falling into a localminimum is great.

An omnidirectional camera is used for a self-localiza-tion and a scene reconstruction in our proposed method.Self-localization methods using monocular stereo oftennecessitate feature tracking (KLT tracker [5], SIFT [6],etc.) to obtain correspondence between different view-

points. However, features will be lost easily because of acamera swing, if a typical camera which has a narrowfield of view is used. Therefore, an omnidirectionalcamera with a wide field of view is effective for a self-localization [7]. An omnidirectional camera in this paperhas a hyperboloid mirror (Figure 1). The omnidirectionalcamera can be regarded as a pinhole camera. Self-localization and scene reconstruction methods using anomnidirectional camera have been proposed [8–10].

Among previous monocular stereo methods, someapproaches have used correspondences of feature points[2,3,8–12], straight lines [13–15], or both features [1].Point-based methods have the benefit of a fastcalculation by a linear solution (eight-point algorithm[16], five-point algorithm [2], etc.). However, the numberof feature points is insufficient in textureless scenes(such as an indoor environment shown in Figure 2). Onthe other hand, line-based methods are available for tex-tureless scenes. Textureless scenes contain anthropogenicobjects. Many anthropogenic objects consist of a linearshape. Therefore, line correspondences are obtainable intextureless scenes.

Our proposed method for the textureless scene recon-struction is based on SfM using line correspondences.Lines which consist of anthropogenic objects are often

*Corresponding author. Email: [email protected]

Advanced Robotics2012, 1–14, iFirst Article

ISSN 0169-1864 print/ISSN 1568-5535 online� 2012 Taylor & Francis and The Robotics Society of Japanhttp://dx.doi.org/10.1080/01691864.2013.751160http://www.tandfonline.com

Dow

nloa

ded

by [

Uni

vers

ity o

f T

okyo

] at

16:

00 0

6 Fe

brua

ry 2

013

mutually parallel. Parallel lines can be extracted from anomnidirectional image easily and constantly because ofits wide field of view. Parallel lines provide valuableconstraints for camera movement estimation. Therefore,the proposed method uses constraints obtained from par-allel lines. As previous methods, SLAM using parallellines and their vanishing point (VP) [17], SfM in urbanenvironment (building scene) [18], rotation estimation byvideo compass [19], and so on have been proposed.However, the computation complexity of these methodsincreases as the number of viewpoints or lines increases[17,18]. In addition, some previous methods have beenbased on the assumption that the camera movement isonly a horizontal movement [18,19]. Our proposedmethod estimates 3D camera movements using corre-spondences of parallel lines and nonparallel lines amongno fewer than three images.

Although there are linear computation algorithms fora line-based SfM using a trifocal tensor [20,21], it isonly for image triplets. A bundle adjustment [22] is usedin many previous methods to estimate the camera move-ment among more than three images. A bundle adjust-ment is a framework of the camera movement estimationbased on reprojection error minimization. The Leven-berg–Marquardt algorithm provides a solution of theerror function minimization in a bundle adjustment.

However, the algorithm often falls into a local minimum[13,14]. Line-based EKF SLAM [15] is proposed. Nev-ertheless, it is known that there is a problem of the erroraccumulation of the camera movement estimation inEKF SLAM.

The proposed line-based SfM can estimate 3D cam-era movements of more than three viewpoints simulta-neously. The constraints obtained from parallel lines areuseful for reduction in the DOF of the camera movementestimation. Using the constraints, estimation of the rota-tion matrices and the translation vectors is divided intotwo procedures. Each procedure is solved as a 1 DOFproblem without regard to the number of viewpoints andfeatures. Consequently, the calculation cost is extremelylow, and it can avoid falling into a local minimum. Theproposed method can obtain the global minimum easily.The effectiveness of our proposed method is demon-strated in experimentally obtained results.

2. Framework of parallel line-based sfm

The prerequisite in our proposed method is that atleast six line correspondences (three parallel lines andthree nonparallel lines) exist between omnidirectionalimages. The image sequence must include at leastthree images. Parallel lines are extracted easily from aman-made structure in an indoor environment becausean omnidirectional camera has a wide field of view.Therefore, the assumption is proper for general indoorenvironments.

The procedure of our proposed method is describedbelow (Figure 3). Lines and feature points are extractedand tracked along an omnidirectional video. At eachviewpoint, parallel lines and the VP are detected fromthese lines. A vector in the direction of the VP is calcu-lated. The vector is called a ‘VP axis’ in this paper.Pseudo-lines, which are created from a couple of featurepoints, are regarded as nonparallel lines.

The estimation of a rotation and a translation is sepa-rated in the proposed method. These separated estima-tions are divided into two procedures.

Figure 1. An omnidirectional camera equipped with a hyperboloid mirror and the acquired image.

Figure 2. Example of a textureless scene (corridor).

2 R. Kawanishi et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

okyo

] at

16:

00 0

6 Fe

brua

ry 2

013

In the first procedure of the rotation estimation, arotation that makes the direction of the VP axes at allviewpoints the same in the world coordinate system iscalculated. In the second procedure, a rotation about theVP axis is estimated. When a rotation between twoviewpoints is determined, rotations at the other view-points are calculated by minimizing a quartic functionabout the rotation angle. The minimum value of a quarticfunction is calculated uniquely. Therefore, a 3D camerarotation movement can be estimated by solving a prob-lem with 1 DOF about a rotation angle between twoarbitrary viewpoints without regard to the number ofviewpoints and features.

In the first procedure of the translation estimation,translations on the plane perpendicular to the 3D direc-tion of the VP axis are estimated. When the translationdirection between two viewpoints is determined, transla-tions at the other viewpoints are calculated uniquely bysolving a simultaneous equation. Therefore, the estima-tion is a problem with 1 DOF about the translation direc-tion on the plane between arbitrary two viewpoints. Inthe second procedure, translations along the VP axis areestimated. In this procedure, when a translation alongVP axis between two viewpoints is determined, transla-tions at the other viewpoints are calculated as a transla-tion minimizing a quartic function about the translation.Therefore, the estimation is also a problem with 1 DOFabout the translation along VP axis between arbitrarytwo viewpoints.

The proposed method estimates 3D camera move-ments by solving three problems with 1 DOF. The calcu-lation cost is extremely low and it is easy to avoidfalling into local minimum. Moreover, the proposedmethod can obtain the global minimum solution easily.

Locations and directions of 3D lines are estimatedsimultaneously with camera movement. For a dense 3Dreconstruction, edge points are measured using the esti-mated camera movement. A mesh model is constructedfrom the measurement results of lines and edge points.Textures obtained from omnidirectional images are addedfor mesh surfaces.

3. Omnidirectional camera coordinate system

The coordinate system of an omnidirectional camera isshown in Figure 4. A hyperboloid mirror reflects a rayfrom the camera lens to image coordinates (u; v). In thispaper, the reflected ray is called a ‘ray vector.’ Theextension lines of all ray vectors intersect at the focus ofthe hyperboloid mirror. The ray vector r is calculatedusing the following equations.

r ¼ r̂k r̂ k

; (1)

r̂ ¼k(u� cx)pxk(v� cy)pykf � 2c

2664

3775; (2)

k ¼a2 f

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffia2 þ b2

pþ b

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiu2 þ v2 þ f 2

p� �a2f 2 � b2(u2 þ v2)

; (3)

Figure 4. Coordinate system of the omnidirectional camera.

Figure 3. Procedures used for our proposed method.

Advanced Robotics 3

Dow

nloa

ded

by [

Uni

vers

ity o

f T

okyo

] at

16:

00 0

6 Fe

brua

ry 2

013

where, cx and cy are the center coordinates of the omni-directional image, px and py are pixel size, f is the focallength of a camera lens, and a, b, and c are hyperboloidparameters. In the proposed method, these parametersare calibrated in advance.

4. Feature detection and tracking

4.1. Feature point tracking

The proposed method needs at least three nonparallellines in addition to three parallel lines. In texturelessscenes, many parallel lines are perpendicular to thefloor. However, there are often insufficient nonparallellines for camera movement estimation, although anomnidirectional camera has a wide field of view(Figure 5(a)). Therefore, the proposed method usesfeature points.

Feature points are tracked along an omnidirectionalvideo by KLT tracker [5]. Pseudo-lines are created froma couple of feature points. Pseudo-lines are regarded asnonparallel lines. Examples of feature points and pseudo-lines are shown in Figure 5(b) and (c). Pseudo-lines inFigure 5(c) are curved because a straight line is projectedas a curved line in an omnidirectional image.

4.2. Line tracking

Straight lines are extracted from distorted omnidirec-tional images. The proposed method obtains edge points

using a Canny edge detector [23]. Examples of edgepoint detection are shown in Figure 6(a) and (b).

To separate each line, corner points are removed asshown in Figure 6(c). Corner points are detected usingtwo eigenvalues of the Hessian H of the image. TheHessian matrix is defined as

H ¼ Ixx IxyIxy Iyy

� �; (4)

where, the derivatives Ixx, Ixy, and Iyy are calculated bytaking differences of neighboring edge points. If the ratioof eigenvalues is sufficiently high, then the edge point isregarded as line-like. The ratio is set to 10 using thetrial-and-error method.

A least-squares plane is calculated from ray vectorsof segmented edge points. If the edge segment consistsof a straight line, then these ray vectors are located onthe same plane (Figure 7). Therefore, an edge segmentwhich has a small least-squares error is regarded as astraight line. The proposed method can extract straightlines, even if an edge segment resembles a curve in anomnidirectional image. If over half the edge points of theedge segment i satisfy the following equation, then thesegment is determined as a straight line (Figure 6(d)).

(ri; jTni)

2\lth: (5)

Figure 5. Creation of pseudo-lines.

Figure 6. Line extraction procedure. Panels (b), (c), and (d) portray enlarged views of the red rectangle in panel (a).


Dow

nloa

ded

by [

Uni

vers

ity o

f T

okyo

] at

16:

00 0

6 Fe

brua

ry 2

013

Therein, lth is a threshold. ri; j is a ray vector to theedge point j from the mirror focus included in the line i.ni is the normal vector of the least-squares plane calcu-lated from the line i. ni is a unit vector. The vector iscalled ‘NV’ in this paper. In the detection, edge pointswhich do not constitute the line are rejected as noises byRANSAC [24]. The threshold lth is determined from theimage resolution.

Lines are tracked along the omnidirectional imagesequence. The proposed method obtains sampling pointslocated on a straight line. Sampling points are extractedat constant intervals (Figure 8(b)). Edge segments areextracted in the next frame (Figure 8(c)). The pointsextracted in Figure 8(b) are tracked to the next frame byKLT tracker (Figure 8(d)). The edge point closest to thetracked point is selected as a corresponding edge point(Figure 8(e)). The edge segment with the maximumnumber of corresponding edge points is regarded as acorresponding edge segment (Figure 8(f)). If an edge

segment corresponds to several lines, then a line havinga larger number of corresponding edge points isselected.

An aperture problem [25] exists in matching thepoint search on the line. However, it is not difficult forthe proposed method to obtain the corresponding edgesegment because it does not require point-to-pointmatching. By continuing the processes explained above,straight lines are tracked along the omnidirectional imagesequence.

4.3. Detection of parallel lines and a VP

Parallel lines and their associated VP are detected fromtracked lines. A VP axis vcj and NVs n

cji for parallel

lines i at the viewpoint cj satisfy the following equation.

vcj Tncji ¼ 0: (6)

Figure 8. Searching for a corresponding edge segment in the subsequent frame.

Figure 7. Relation between a straight line and ray vectors.

Advanced Robotics 5

Dow

nloa

ded

by [

Uni

vers

ity o

f T

okyo

] at

16:

00 0

6 Fe

brua

ry 2

013

Here, the superscript of the vector signifies its refer-ence coordinate system. The reference coordinate systemof the vector without a superscript is the world coordi-nate system. Three lines are necessary for the parallelline detection from an image. The proposed methodselects three lines randomly from tracked lines. A vectorvcjrand that minimizes Ev in Equation (7) is calculated

using the least-squares method.

Ev ¼Xnli

vCj

rand

Tncji

� �2

: (7)

In that equation, nl is the number of lines. If selectedlines satisfy the following equation in the entirety of aninput image sequence, then these are regarded as parallellines.

Ev\pth; (8)

where, pth is a threshold. Lines that are regarded asmutually parallel are integrated. A line group that hasthe maximum number of lines is used for the followingprocess as parallel lines. The VP axis vcj is calculatedfrom integrated parallel lines as a vector v

cjrand that mini-

mizes Ev in Equation (7).

5. Parallel line based SfM

5.1. Estimation of camera rotation and line direction

The camera rotation estimation process is divided intotwo procedures. In the first procedure, the method calcu-lates a camera rotation matrix that makes the direction ofVP axes among all viewpoints the same. In the secondprocedure, a rotation about the VP axis is estimatedusing at least three lines. These lines must have adifferent 3D direction from parallel lines.

5.1.1. Vp direction matching

This procedure requires VP axes, which should have thesame 3D direction in the world coordinate because a VP

lies at an infinite distance from the viewpoint, theoreti-cally. Therefore, the proposed method calculates a rota-tion matrix R

cjm satisfying the following equation.

vc0 ¼ RcjmTvcj : (9)

In that equation, Rcjm is a rotation matrix that makes

the direction of the VP axis vcj the same as that of theVP axis vc0 at the initial camera coordinate system. Here,the initial camera coordinate system is equal to the worldcoordinate system in this paper. R

cjm is calculated as a

rotation about an axis mcj using the Rodrigues rotationformula. The rotation axis mcj and angle hcj are calcu-lated by the following equations. A relation between mcj

and hcj is shown in Figure 9.

mcj ¼ vc0 � vcj ; (10)

hcj ¼ arccos (vc0 Tvcj ): (11)

The 3D directions of parallel lines and the VP axisvc0 are the same. Therefore, the vector vc0 also representsa 3D direction of parallel lines in the followingexplanation.

5.1.2. Estimation of rotation about the vp axis

In the second procedure, a rotation matrix about the VPaxis vc0 is estimated. This procedure requires at leastthree lines. The 3D direction of these lines must not beequal to the VP axis vc0 .

In the proposed method, the only remainingunknown parameter of 3D rotation is rotation R

cjv about

the VP axis vc0 because the other two parameters areobtainable from the constraints of the VP. Therefore, thisprocedure estimates a rotation matrix R

cjv , namely, the

rotation angle /cj (Figure 10).The camera rotation matrix Rcj between the initial

viewpoint c0 and a j-th viewpoint cj is defined using tworotation matrices as the following equation.

Figure 9. Relation among the rotation axis mcj and VP axes vc0 and vcj .


Dow

nloa

ded

by [

Uni

vers

ity o

f T

okyo

] at

16:

00 0

6 Fe

brua

ry 2

013

Rcj ¼ RcjmR

cjv : (12)

The camera rotation matrix Rcj and 3D line directiondi satisfy the following equation.

Rcj Tncji

� �Tdi ¼ 0: (13)

In that equation, ncji is the NV of the line i at the

viewpoint cj. di is a unit vector. The NVs in the world

coordinate system Rcj Tncji is perpendicular to the 3D line

i. When a rotation angle /cj is given, the 3D direction diof the line i is calculated by cross product of NVs atviewpoint c0 and cj (Figure 11).

di ¼ nc0i � (Rcj Tncj

i ): (14)

Using the 3D line direction, a rotation matrix Rck

between the initial viewpoint c0 and the other viewpointck is calculated by solving the following equation.

erot(/ck ) ¼

Xnli

(Rck Tncki )

Tdi

2! min; (15)

where, nl is the number of nonparallel lines. Althoughthe function is nonlinear, it is easily solvable becauseerot(/

ck ) is just a quartic function about /ck . Conse-

quently, if a rotation angle /cj at a viewpoint cj is given,then rotation angles /ck at the other viewpoints ck aredetermined. The proposed optimization of rotations isrepresented as the following equation.

Erot(/cj) ¼

Xnck

erot(/ck ) ! min; (16)

where, nc represents the number of viewpoints. In theproposed method, the rotation estimation is a searchproblem with 1 DOF about the rotation angle /cj withoutregard to the number of viewpoints and features.

In the following explanation, v expresses the VP axisvc0 or 3D direction of parallel lines.

5.2. Estimation of camera translation and line location

Camera translations are estimated using two procedures.In the first, translations on the plane perpendicular toparallel lines are estimated. In the second, translationsdirected along parallel lines are estimated. As is true alsofor rotation estimation, these two procedures of the trans-lation estimation are solvable as problems with 1 DOF.

5.2.1. Translation on a perpendicular plane

In the first procedure, translations on a plane perpendicularto parallel lines are estimated. This procedure requires atleast three parallel lines. Translations on the plane and 3Dlocations of parallel lines are optimized simultaneously.

Here, the method introduces basis vectors a and b(a?b; a?v, and b?v). A unit vector gi;cj from the

viewpoint cj to the parallel line i is calculated usingEquation (17). The vector gi;cj is perpendicular to parallel

lines as

gi;cj ¼ v� (Rcj Tgcji ): (17)

Using these vectors, a and b elements of the truetranslations can be estimated. With a translation vectortp;cj between the initial viewpoint c0 and a viewpoint cj,the location li of parallel lines i and a vector gi;cj satisfy

the following equation. The relation among these vectorsis shown in Figure 12.

dcji gi;cj þ tp;cj � li ¼ 0: (18)

In that expression, dcji is a fixed number representingthe depth of the line at the viewpoint. dcji is calculated as

dcji ¼ (tp;cj � li)Tgi;cj

gTi;cjgi;cj: (19)

Figure 10. Rotation about the VP axis vc0 .

Figure 11. The 3D line direction calculation.

Advanced Robotics 7

Dow

nloa

ded

by [

Uni

vers

ity o

f T

okyo

] at

16:

00 0

6 Fe

brua

ry 2

013

Here, the translation tp;cj is expressed as Equation(20),

tp;cj ¼ a coswcj þ b sinwcj ; (20)

where, wcj denotes the translation direction from the ini-tial viewpoint c0 to the viewpoint cj. The absolute scaleis unknown in the SfM approach. Consequently, the dis-tance between these two viewpoints is set to 1 in theproposed method. When the direction wcj is given,the locations of parallel lines are calculated using thefollowing equation.

li ¼fc0i gi;c0 þ fcji gi;cj þ tp;cj

2; (21)

where, fc0i and fcji are fixed factors representing the depth

of the line from each viewpoint. Fixed factors fc0i and fcjiare calculated as factors satisfying the followingexpression.

kfc0i gi;c0 � fcji gi;cj � tp;cjk2 ! min : (22)

Using 3D locations of parallel lines, translations atthe other viewpoint ck are calculated by minimizing thefollowing equations.

et1(kcka ; k

ckb ) ¼

Xnli

kdcki gi;ck þ tp;ck � lik2; (23)

tp;ck ¼ kcka aþ kckb b: (24)

When the line location li, namely, the translationdirection wcj is given, Et1 (k

cka ; k

ckb ) is solvable as a simul-

taneous equation about kcka and kckb . Therefore, transla-tions on the plane vertical to parallel lines are optimizedby minimizing the following function about wcj .

Et1(wcj ) ¼

Xnck¼1

et1(kcka ; k

ckb ): (25)

Consequently, Equation (25) is solvable as a searchproblem with 1 DOF about the translation direction wcj

on the plane is perpendicular to parallel lines.

5.2.2. Translation along the vp axis

In the second procedure of the translation estimation,translations along the parallel line direction are esti-mated. This procedure requires at least three nonparallellines. The camera translation vector tcj is represented as

tcj ¼ tp;cj þ xcj tv; (26)

where, xcj represents the distance of translation alongparallel line direction. Translations and nonparallel linelocations are optimized by minimizing the sum of repro-jection errors in Equations (27)–(29).

et2 ¼Xnli

1� qi;cjTg0i;cj

� �2

; (27)

qi;cj ¼l0i � tcj þ scji di

k l0i � tcj þ scji di k; (28)

scji ¼ (tcj � l0i)Tdi

diTdi

: (29)

In those expressions, qi;cj is a unit vector crossed at aright angle to the nonparallel line i from the viewpointcj. g0i;cj is a unit vector from the viewpoint cj to the non-

parallel line i. g0i;cj is calculated using the same proce-

dures as those described in Eq. (17). The relationbetween these vectors is shown in Figure. 13. If noerrors exist, then these two vectors qi;cj and g0i;cj will be

Figure 12. Relation among 3D direction of parallel lines, basis vector, 3D location of the parallel line, a vector gi;cj, and thetranslation on the plane.


Dow

nloa

ded

by [

Uni

vers

ity o

f T

okyo

] at

16:

00 0

6 Fe

brua

ry 2

013

the same. However, in fact, these have different directionbecause of various errors. The angle error is almost iden-tical to a reprojection error.

When xcj , namely, translation from the initial view-point c0 to the viewpoint cj, is determined, 3D line loca-tions l0i are calculated in the same way as Equation (18).Using the locations, Equation (27) at the other viewpointck is solvable as a quartic function about xck . Therefore,the translation estimation is a search problem with 1DOF about xck . The optimum translations are estimatedby minimizing the following equation.

Et2(xcj) ¼

Xnck¼1

et2(xck ): (30)

The proposed method can measure straight linesusing the processes described above. Moreover, edgepoints are reconstructed to measure environments den-sely. The edge point reconstruction is based on [26].

6. Experiments

We demonstrate the proposed camera movement estima-tion using simulation data. All experiments are done withoff-line processing. The CPU is an Intel Core i7 975(3.33GHz). In this experiment, the values of NVs ni;cjand VP axes vcj are given. It is known which lines areparallel. The camera movement includes a 3D rotationand translation.

We first verified that the proposed method canestimate the camera movements from 6 lines (three paral-lel lines and three nonparallel lines). The number ofviewpoints is 10. The true values of NVs are given inthis experiment. The position relation between the view-points and lines is shown in Figure 14. The red, yellow,and orange axes show the camera coordinate system ateach viewpoint. Parallel lines are shown as green. Otherlines are represented as blue. The estimation errors ofcamera movement and line measurement were within therounding error.

A local minimum naturally exists in the proposedmethod. However, around the ground truth, Figure 15shows that the evaluation values of Equations (16), (25),and (30) are sufficiently low compared to other values.Figure 15 shows an example of the evaluation value inthe rotation estimation. For that reason, local minimumavoidance is easy for the proposed method. An exampleof a computation time on the rotation and translationestimation by the proposed method is shown inFigure 16. The Figure shows that the computation timeis proportional to the number of viewpoints.

We verified the robustness of the proposed methodwith noisy data. In this experiment, noisy NVs are given.The noise implies an angle error between a given vectorand the true one. The noise follows a normal Gaussiandistribution. We evaluated estimation errors of thecamera rotations and translations using noisy dataincluding 0.01–1.28 deg angle errors on average. Inputdata are 40 lines including 20 parallel lines acquired at20 viewpoints. At each noise level, 100 trial runs wereconducted.

The estimation results are presented in Figure 17.The rotation error in Figure 17(a) represents angle errorsbetween the axis of estimated camera coordinate systemand the true one. The translation error in Figure 17(b)represents the percentage of the distance error betweenthe estimated camera location and the ground truth totranslation distance. These values are the average of 100times trial runs.

Rotation estimation error was within the given noise.The translation estimation error was within 1% againsttranslation distance when the given noise is within0.16 deg. According to our camera calibration, the pro-posed method can extract lines from an omnidirectionalimage (the image size is 800 � 600 pixels) within0.05 deg errors. Therefore, these results show that theproposed method performs well in noisy data.

We compared the accuracy of the camera movementestimation using the proposed method with that of acommon SfM using a bundle adjustment method. A bun-dle adjustment method is well known as an optimizationof the camera movement. In this experiment, a commonSfM optimizes the camera movement using featurepoints and lines that are the same as those of theproposed method. This experiment is demonstrated in avirtual environment as shown in Figure 18(a) to obtainground truth of the camera movement. Simulated omni-directional images depending on the camera movementare created from the structure and color information ofthe environment (Figure 18(b)). The image size is800 � 600 pixels. The created 51 images are used asthe input image sequence. In this experiment, the giveninformation is these images only. Feature points andlines are detected automatically from the images. TheNVs and VP axis are calculated from the detected lines.

Figure 13. Reprojection error of a straight line.

Advanced Robotics 9

Dow

nloa

ded

by [

Uni

vers

ity o

f T

okyo

] at

16:

00 0

6 Fe

brua

ry 2

013

Not only feature points but also NVs are used for theoptimization by a bundle adjustment. The initial valuefor a bundle adjustment is obtained using a frameworkwith eight-point algorithm and RANSAC [24].

The results of camera movement estimation areshown in Figure 19 and Table 1. We compare the recov-ered camera movements with the ground truth by align-

ing their movement distances and initial poses. In thisexperiment, the ground truth of the initial camera orien-tation and location are known because the experimentwas done in a virtual environment. The estimated cameramovement by the proposed method (green marks inFigure 19) is close to the ground truth (blue marks inFigure 19). The reprojection error of the proposedmethod is within 0.5 pixels. Therefore, the result showsthat the proposed method can obtain the global mini-

Figure 14. The 3D camera movement and line position in simulation.

Figure 15. Evaluation value example in rotation estimation.

Figure 16. Computation time with the number of viewpoints.

Figure 17. Estimation errors with noisy data.


Dow

nloa

ded

by [

Uni

vers

ity o

f T

okyo

] at

16:

00 0

6 Fe

brua

ry 2

013

mum. Although the same features are used for the exper-iments, camera movement estimation error of a commonSfM is larger than the proposed method (Table 1). Abundle adjustment can obtain the optimal solution on theideal situations. However, nonlinear optimizations, suchas a bundle adjustment, often fall into a local minimumin the course of experimentally obtained results.

Figure 20(a) shows our verification of the proposedmethod using real images in a textureless scene. Using amobile robot equipped with an omnidirectional cam-era,800 images were acquired. The movement distance isabout 12m. The image size is 800 800 � 600. An inputimage and a parallel detection result are, respectively,shown in Figure 20(b) and (c). The computing time ofthe line tracking was about 100ms per frame, KLTtracker was 10ms per frame, and the parallel lines detec-tion was 35ms per frame.

The result of the camera movement estimation andline measurement is shown in Figure 21. The computa-tion time is about 5min. An average of the reprojectionerrors of lines is within 0.5 pixels. These results showthat the proposed method can estimate the camera move-ment and measure lines precisely. The modeling result ofthe textureless scene is shown in Figure 22. The model-ing method is based on [27]. The corridor shape can bereconstructed using the proposed method. However,

Figure 18. Virtual environment for evaluation of camera movement estimation.

Figure 19. Camera movement estimation results obtained using simulated images.

Table 1. Comparison of camera movement estimation results.

Proposed SfM Common SfM

(a) Rotation errorsAverage [°] 0.119 0.177Maximum [°] 0.298 0.350

(b) Translation errorsAverage [%] 0.357 1.74Maximum [%] 0.425 3.23

Advanced Robotics 11

Dow

nloa

ded

by [

Uni

vers

ity o

f T

okyo

] at

16:

00 0

6 Fe

brua

ry 2

013

Figure 20. Detection result of parallel lines from omnidirectional images of a textureless scene.

Figure 21. Results of camera movement estimation and line measurement in a textureless scene.

Figure 22. Modeling result of textureless scene.


Dow

nloa

ded

by [

Uni

vers

ity o

f T

okyo

] at

16:

00 0

6 Fe

brua

ry 2

013

although parallel lines are detected, it is difficult to con-struct correct shapes of border of the wall and the floor.Correct patches are removed or false patches remainattributable to dead angle of an omnidirectional camera,and the difference during tracked frames of lines.Improvement of the problem is an important task for ourfuture work. These experimentally obtained results showthat the proposed method is effective for the reconstruc-tion of a textureless scene.

7. Conclusion

In this paper, we proposed a parallel line-based SfM fortextureless scenes. Camera rotations and translations areestimated as a reasonable problem with 1 DOF using theconstraints from parallel lines. Therefore, the global opti-mal solution is obtainable easily. Experiments underscorethe effectiveness of the proposed method.

As future works, the robustness of the line detectionshould be improved. Illumination change makes the linedetection unstable. Moreover, we should generate aframework of a parallel line-based SfM for a long imagesequence. The proposed method requires lines corre-sponding along all images in an input image sequence.

AcknowledgmentsThis work was in part supported by a MEXT KAKENHIGrant-in-Aid for Young Scientists (A), 22680017, and theAsahi-Glass Foundation.

Notes on contributors

Ryosuke Kawanishi received BE, ME, andPhD from the Department of MechanicalEngineering, Shizuoka University, in 2007,2009, and 2012, respectively. He has beenworking at the Advanced Technology R&DCenter, Mitsubishi Electric Corporationsince 2012. His research interests arecomputer vision and robot vision. He is amember of IEEE, RSJ, and JSPE.

Atsushi Yamashita received BE, ME, andPhD from the department of precisionengineering, the University of Tokyo, in1996, 1998, and 2001, respectively. From1998 to 2001, he was a Junior ResearchAssociate in the RIKEN (Institute ofPhysical and Chemical Research). From2001 to 2008, he was an assistant professorof Shizuoka University. From 2006 to 2007,

he was a Visiting Associate of California Institute ofTechnology. From 2008 to 2011, he was an associate professorof Shizuoka University. From 2011, he is an associate

professor in the Department of Precision Engineering, theUniversity of Tokyo. His research interests are robot vision,image processing, multiple mobile robot system, and motionplanning. He is the member of ACM, IEEE, JSPE, RSJ,IEICE, JSME, IEEJ, IPSJ, ITE, and SICE.

Toru Kaneko received BE and ME fromthe department of applied physics, theUniversity of Tokyo, in 1972 and 1974,respectively. From 1974 to 1997, he joinedNippon Telegraph and TelephoneCorporation (NTT). He received PhD fromthe University of Tokyo in 1986. Hehas been a professor in the Departmentof Mechanical Engineering, Shizuoka

University, since 1997. His research interests are computervision and image processing. He is a member of IEEE, IPSJ,ITE, RSJ, etc.

Hajime Asama received his BS, MS, andDr. Eng in Engineering from the Universityof Tokyo, in 1982, 1984, and 1989,respectively. He was Research Associate,Research Scientist, and Senior ResearchScientist in RIKEN (The Institute ofPhysical and Chemical Research, Japan)from 1986 to 2002. He became a professorof RACE (Research into Artifacts, Center

for Engineering), the University of Tokyo in 2002, and aprofessor of School of Engineering, the University of Tokyosince 2009. He received JSME (Japan Society of MechanicalEngineers) Robotics and Mechatronics Division AcademicAchievement Award in 2001, RSJ (Robotics Society of Japan)Best Paper Award, JSME Robotics, and Mechatronics Award in2009, etc. He is the vice-president of Robotics Society of Japansince 2011. He was an AdCom member of IEEE Robotics andAutomation Society from 2007 to 2009, the president elect ofInternational Society for Intelligent Autonomous Systems(2012), and an associate editor of Journal of Intelligent ServiceRobotics, Journal of Field Robotics, Journal of Robotics andAutonomous Systems, and Control Engineering Practice. Heplayed the director of the Mobiligence (Emergence of adaptivemotor function through the body, brain, and environment)program in the MEXT Grant-in-Aid for Scientific Research onPriority Areas from 2005 to 2009. He is a Fellow of JSMEsince 2004 and RSJ since 2008. Currently, he is a member ofHeadquarter of Mid-term and Long-term Research andDevelopment Promotion of Japanese Government and TEPCO,a member of Advisory Committee on Medium-range and Long-range Measures for Fukushima Daiichi Nuclear Power Plant ofJapan Atomic Energy Commission, the leader of Project onDisaster Response Robots and Their Operation System ofCouncil on Competitiveness-Japan, and the chairman ofRobotics Task Force for Anti-Disaster (ROBOTAD). His mainresearch interests are distributed autonomous robotic systems,ambient intelligence, service engineering, and Mobiligence, andservice robotics.

Advanced Robotics 13

Dow

nloa

ded

by [

Uni

vers

ity o

f T

okyo

] at

16:

00 0

6 Fe

brua

ry 2

013

References

[1] Nister D, Reconstruction from uncalibrated sequenceswith a hierarchy of trifocal tensors. In: Proceedings of the6th European Conference on Computer Vision(ECCV2000). Vol. 1; 2000. p. 649–663.

[2] Nister D. An efficient solution to the five-point relativepose problem. IEEE Trans. Pattern Anal. Mach. Intell.2004;26:756–770.

[3] Davison AJ. Real-time simultaneous localisation and map-ping with a single camera. In: Proceedings of the 9thIEEE International Conference on Computer Vision(ICCV2003). Vol. 2; 2003. p. 1403–1410.

[4] Tomono M. 3D object mapping by integrating stereo slamand object segmentation using edge points. Adv. VisualComput. Lect. Notes Comput. Sci. 2009;5875:690–699.

[5] Shi J, Tomasi C. Good features to track. In: Proceedingsof the 1994 IEEE Computer Society Conference on Com-puter Vision and, Pattern Recognition (CVPR1994); 1994.p. 593–600.

[6] Lowe DG. Distinctive image features from scale-invariantkeypoints. Int. J. Comput. Vision. 2004;60:91–110.

[7] Gluckman J, Nayar SK. Ego-motion and omni-directionalcameras. In: Proceedings of the 6th International Confer-ence on Computer Vision (ICCV1998); 1998. p. 999–1005.

[8] Bunschoten R, Krose B. Robust scene reconstruction froman omnidirectional vision system. IEEE Trans. Robot.Autom. 2003;19:351–357.

[9] Geyer C, Daniilidis K. Omnidirectional video. VisualComput. 2003;19(6):405–416.

[10] Murillo A, Guerrero J, Sagues C. SURF features for effi-cient robot localization with omnidirectional images. In:Proceedings of the 2007 IEEE International Conferenceon Robotics and Automation (ICRA2007); 2007. p. 3901–3907.

[11] Tardif J, Pavlidis Y, Daniilidis K. Monocular visual odom-etry in urban environment using an omnidirectional cam-era. In: Proceedings of the 2008 IEEE/RSJ InternationalConference on Intelligent Robots and Systems(IROS2008); 2008. p. 2531–2538.

[12] Torii A, Havlena M, Pajdla T. From Google Street Viewto 3d city models. In: Proceedings of the 9th Workshopon Omnidirectional Vision, Camera Networks and Non-classical Cameras (OMNIVIS2009); 2009. p. 2188–2195.

[13] Bartoli A, Sturm P. Multi-view structure and motion fromline correspondences. In: Proceedings of the 9th IEEEInternational Conference on Computer Vision(ICCV2003); 2003. p. 207–212.

[14] Bartoli A, Sturm P. Structure-from-motion using Lines:Representation, triangulation, and bundle adjustment.Comput. Vision. Image Und. 2005;100:416–441.

[15] Smith P, Reid I, Davison A. Real time monocular SLAMwith straight lines. In: Proceedings of the 2006 BritishMachine Vision Conference; 2006. p. 17–26.

[16] Hartley R. In defense of the eight-point algorithm. IEEETrans. Pattern Anal. Mach. Intell. 1997;19:580–593.

[17] Bosse M, Rikoski R, Leonard J, Teller S. Vanishing pointsand 3d lines from omnidirectional video. In: Proceedingsof the 2002 International Conference on Image Processing(ICIP2002). Vol. 3; 2002. p. 513–516.

[18] Schindler G, Krishnamurthy P, Dellaert F. Line-basedstructure from motion for urban environments. In: Pro-ceedings of the 3rd International Symposium on 3DData Processing, Visualization, and Transmission; 2006.p. 846–853.

[19] Mariottini GL, Prattichizzo D. Uncalibrated video com-pass for mobile robots from paracatadioptric line images.In: Proceedings of the 2007 IEEE/RSJ International Con-ference on Intelligent Robots and Systems (IROS2007);2007. p. 226–231.

[20] Torr PHS, Zisserman A. Robust parameterization andcomputation of the trifocal tensor. Image Vision Comput.1997;15:591–605.

[21] Hartley R. Lines and points in three views and the trifocaltensor. Int. J. Comput. Vision. 1997;22:125–140.

[22] Triggs B, McLauchlan P, Hartley R, Fitzgibbon A. BundleAdjustment – a modern synthesis. Vis. Algorithms: TheoryPractice, Lect. Notes Comput. Sci. 2000;1883:298–372.

[23] Canny JF. A computational approach to edge detection.In: IEEE Transactions on Pattern Analysis and MachineIntelligence, PAMI-8; 1986. p. 679–698.

[24] Fischler MA, Bolles RC. Random sample consensus: aparadigm for model fitting with applications to imageanalysis and automated cartography. Commun. ACM.1981;24:381–395.

[25] Nakayama K, Silverman G. The aperture problem – II.Spatial integration of velocity information along contours.Vision Res. 1988;28:747–753.

[26] Tomono M. dense object modeling for 3-D map buildingusing segment-based surface interpolation. In: Proceedingsof the 2006 IEEE International Conference on Roboticsand Automation (ICRA2006); 2006. p. 2609–2614.

[27] Kawanishi R, Yamashita A, Kaneko T. Three-dimen-sional environment model construction from an omnidi-rectional image sequence. J. Robot. Mechatron. 2009;21:574–582.


Dow

nloa

ded

by [

Uni

vers

ity o

f T

okyo

] at

16:

00 0

6 Fe

brua

ry 2

013

parallel line-based structure from motion by using ... · full paper parallel line-based structure...

Documents