inverse depth monocular slam - oxford brookes...

Javier CiveraJMM Montiel A.J. Davison

Inverse Depth Monocular SLAMJavier Civera, Andrew J. Davison, JMM Montiel


Problem Statement

• Sequential simultaneous sensor location and map building at frame rate, 30Hz.

• Camera moves freely in 3D, 6dof camera motion

• Outdoors real scenes contains close and distant, even at infinity, features

• Main contribution codifying scene points with its inverse depth:

1. Deals with low parallax cases2. Deals with both distant and

close points3. Map features are initialised

just from one image


Camera Geometry: Pure Bearing-only Sensor

X[X,Y,Z]

camera,optical center

O

f

• Camera detects rays• A ray is defined by the optical

center O and the observed point• The images is used as the method

to determined the detected ray• Depth is not detected

x[x,y]x

yY

Z

X

Earththefromfar light years10at

gallaxiesofimage9


Points at Infinity

• projective cameras do observe points at infinity• parallel lines meet at infinity, a projective camera does observe this

intersection point as vanishing point• we intend to code and exploit this points at infinity in the monocular

SLAM problem


Parallax

• no parallax geometries– Camera rotation– Camera observing a scene

plane• low parallax cases

– Distant features compared with camera translation

– Initial feature observation


Stereo Vision. Sensitivity Analysis

β

X

P1 P2b

Plano deimagen

Centroóptico

Ejeóptico

P

Z

2x1x

1C2C

f f

θ 1 θ 2

angleParallax z

rad toup add ein triangl angles threeThe

tantan

tantan

tan 2,ray tan 1,ray

21

2122

11

21

2

1

ββ

πθθθθ

θθθθ

θθ

bPCC

bzbz

bzxzx

≈

−≈

≈≈

−=

+==

Geometry, depending only on:• relative camera location • observed point location

Parallax, angle defined by the two rays corresponding to the two cameras fora scene point.


Image errors

pxσ2

oσ2

f

(mm.)lenght focal lens,

(mm.) size pixel , ,

embetween threlation eapproximatan

radiansin , error,n orientatio for thedeviation standard theerrorn orientatioray on the based is analysiserror

pixels 5.0deviation standard ingcorrespond pixel, 1error clicking afor example,

2 is rangeerror thumbof rule a as

pixelsin deviation standarderror ,

o

o

f

dfd

px

px

px

px

σσ

σ

σ

σ

σ

≅

=±

±


Stereo vision. Gaussian propagation

β

X

b

ImagePlane

OpticalAxis

P

1C2C

221 θ+θ

bb oo22

22β

σ=

βσ

5.121

2β

θ+θσob

direction. rays twoon dependsdirection ZX,on n propagatioError

direction. bisectric rays twoin the oriented axismajor with Elliposid

1.5o

2o

power 1.5th theoparallax t inverse

baseline to alproportion error,depth an smaller therror Lateral

parallax squared inverse

baseline to alproportionerror Depth

βσ

βσ

b

b


Linearity limits of the Gaussian propagation.Inverse depth coding

o5.0=α

o0.1=σ o0.1=σ

-2 -1 0 10

100

200

300

400

500

600

700

800

900

x

z

x z representation

-1 -0.5 0 0.50

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

ρ =1/

d

θ(deg)

ρθrepresentation

Simulation: computing depth of a point from 2 views at known camera locations• Non Gaussian in XZ• Gaussian in 1/d, theta


State of the art I

• SLAM, initially proposed by Smith and Cheesman, 1986, widespread usage in robotics for multisensor fusion [Castellanos 1999], [Feder 1999] [Thrun et al. 2005]

– Sequential approach– Ability to close loops, identifying features previously observed as

reobserved. Complexity is linked to the scene not to the number of observations processed.

• SLAM used for computer vision, [Castellanos 2000], [Davison 1998] combined with odometry

• Monocular SLAM vision [Davison 2003]– Camera "following the laws of mechanics" motion model– Vision as the only sensor, no odometry.– Synergic usage of vision geometry and vision photometric map– Low parallax points avoided:

» Points represented as XYZ, only works with points close to the camera

» Delayed initialization

SFM

,com

pute

r vi

sion

met

hods


State of the art II

• Photogrammetric bundle adjustment, 60's– Normally only close points

• Computer vision geometry Hartley & Zisserman [Hartley 2003]– Robust statistics– Matching between several shots enforcing a coherence with a

projective camera model– Applied to individual shots, to sequences with varying camera

parameters– Applied for robot navigation [Nister 2003, Mouragnon 2006]– Not sequential– Wide-baseline performance– Routine usage of points at infinity

• Model selection problem [Torr 1998, 1999 ]– No parallax, homography model– Parallax epipolar geometry model– Increasing the frame rate, the interframe motion closes to a

homography

SLA

M m

etho

ds


State of the art IIIFeature initialization in Monocular SLAM• Delayed approach: Image tracking until safe Gaussian triangulation

Bailey 2003, Davison 2003• Undelayed intialization

– Multiple hypotheses in depth, Kwok Dissanayake 2004, Sola et al. 2005,

Inverse depth usage• Concept used in different domains

– Parallax with respect plane at infinity, Zisserman 2003– Optical flow Heeger & Jepson 1992– Modified polar coordinates in bearing only TMA Aidala & Hammel 1983– Sequential structure estimation from known motion Okutomi&Kanade 1993.– Pairwise first and individual EKF's Chowdhury&Chellappa 2003

• Recently used for Monocular SLAM– Trwany & Roumeliotis 2005– Eade & Drummond 2006, FastSLAM, a distinguished intialization stage.


Camera motion priors

W

CC

CC

Constant velocity motion model:• Smooth camera motion• Impulse acceleration noise

( )WCWC qr ,

Wv

CωC


Scene point coding in inverse depth. Measurement equation

( )

( )ιι φθ

ρ

ρ

φθ

ρ ,as observed ispoint a parallax,low at

zero togoes , baseline,low locations,camera close

zero togoesparallax ,0 point,distant

parallax ,

observed is feature thefirst time theof those, , ,

mRh

r

r

m

CW=

−⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

⇒→

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛−

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

C

WC

i

i

i

i

WC

i

i

i

i

ii

i

i

i

zyx

zyx

zyx

parallaxangleα

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛−

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛C

i

i

i

i

zyx

Wrρ

W

WCr

ii

d=ρ1

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

i

i

i

zyx m

CXYZh

C

( )WCWC qr ,

Cρh

z

y

y hh

dfvv −= 0 0

z

x

x hh

dfuu −=

( )⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

+⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛−

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛==

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

ιιι φθρρ ,mrRh CW WC

i

i

iC

z

y

x

zyx

hhh


SLAM joint map+camera state vectorThe full estimate coded in a unique Gaussian distribution

1. Camera state vector.2. ALL the map features.3. The joint Gaussian has proven to be useful in coding strong

correlation between the observations.

( )

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

=

=

nnnnv

nv

nvvvv

TT

T

TTn

TTv

yyyyyx

yyyyyx

yxyxxx

PPP

PPPPPP

P

yyxx

L

MOMM

L

L

L

1

1111

1

1


Active search

storedpatch

Estimation at step k-1

Prediction at step k

Update at step kstored patch is searched in allacceptance the region


[ ]infinite including and fromregion a covers

2,2- interval at the th

so dinitialize is covarinace its and at nobservatio featurefirst thefrom dinitialize are

covariance ingcorrespond theand ,,,,nobservatioan just from observed are points New

min

0

d

zyx

ρρ

ρ

σρσρ

σρρ

φθ

+

Inverse depth feature initialization prior

00 21

σρ +

ρσρ 210

min

+=d

0

1ρ

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

i

i

i

zyx

( )ii φθ ,m

ρσρ 20 −

∞

0ρ


Dimensional Monocular SLAM

0

00

a

,,

,

ρ

ω

α

σρσσ

σσ

V

tzi

Δσ,z

( )T

nCWWCWC yyvqr L1,,,, ωMonocular SLAM

Processing

So estimation process can be considered as a function

Being the dimensions involved, time, and length:

camera location

scenemap

Calibrated image measurements

Frame Rate

Camera Motion Priors

Intial depth Prior


Dimensionless Monocular SLAM

0 ,1

,

,

00

a

ρ

ω

α

σ

σσ

σσ

ρ Π

ΠΠ

ΠΠ

V

=

1

,

=Δtzi σΠΠz

Monocular SLAM

Processing

camera location

scenemap

Calibrated image measurements

Frame Rate

Camera Motion Priors

Intial depth Prior

state vector split in scale, d, and dimensionless coefficients

are selected to define the dimensionless coefficients [Buckingham 1914]:0, ρtΔ

monocular camera cannot observe scale, d. If scale were known:


Interpreting Dimensionless Parameters as Image Quantities

dimensionlesscamera linearacceleration

standard deviation

dimensionlesscameraangular

accelerationstandard deviation


Inverse depth estimation history

near feature (3), eventually excludes infinite from the acceptance region

distant feature (11), infinite always included in the acceptance region

11 3

113

11 3

113

(b)

11 3

11

3

(a)

11

zero inverse depth

0 250 500 750-1.5

-1

-0.5

0

0.5

1

1.5

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0 250 500 750-1.5

-1

-0.5

0

0.5

1

1.5

-0.02

0

0.02

0.04

0.06

infinity / zero inverse depth


200 400 6000

0.2

0.4

0.6

0.8

1rx

frame number

met

ers

200 400 6000

0.2

0.4

0.6

0.8

1ry

frame number

met

ers

200 400 6000

0.2

0.4

0.6

0.8

1rz

frame number

met

ers

200 400 6000

0.5

1

1.5

2ψ (rotx)

frame numberde

g

200 400 6000

0.5

1

1.5

2θ (roty)

frame number

deg

200 400 6000

0.5

1

1.5

2φ (rotz)

frame number

deg

Loop Closing


Linearity Index


Inverse depth linearityanalysis

linearitypoor L ,1cos0

tion initializaat

d ↑≈⇒≈ αα

estimation whole thealong eperformanc goodlinearity

but ,cos1 gathering,parallax after

linearity L , 0cos10

tion initializaat

ρ

ρ

σα

αα

−

↓≈−⇒≈


Inverse depth to XYZ conversion• Inverse depth good performance along the whole estimation process• Inverse depth coding needs 6 parameters• XYZ coding good performance for reduced depth uncertainty• So, it is not mandatory to switch from inverse depth to XYZ, but

computing overhead can be reduce• Switching criteria based on the linearity index


Inverse depth to XYZ conversion threshold


0 100 200 300 400 500 600 7000

200

400

600

dim

ensi

on

state vector dimension

0 100 200 300 400 500 600 70070

80

90

100

% re

duct

ion

% dimension reduction for code swithching

0 100 200 300 400 500 600 7000

50

100

# m

ap 3

D p

oint

s

# map 3D points

swithching inverse depth to XYZ

all map features in inverse depth

total # 3D points

total # 3D points inverse depth

total # 3D points in XYZ

Switching evolution


Switching evolution

(a)

(c)

(b)

(d)

Inverse depth coding

Depth coding


BibliografíaJ.M.M. Montiel, Javier Civera y Andrew J. Davison: “Unified Inverse DepthParametrization for Monocular SLAM”. Robotics: Science and Systems Conference2006.

Javier Civera , Andrew J. Davison and J.M.M. Montiel,: “Inverse Depth to DepthConversion for Monocular SLAM”. IEEE Int Conf on Robotics and Automation Rome, April 2007.

Javier Civera, Andrew J. Davison, J. M. M. Montiel. "Dimensionless Monocular SLAM". 3rd Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA), Girona, 2007

J.M.M Montiel and Andrew J. Davision: "A visual compass based on SLAM". In Proc. Intl. Conf. on Robotics and Automation, pages 1917--1922, 2006.

J.M.M Montiel home page: http://webdiis.unizar.es/~josemari/

Anderw Davison home page http://www.doc.ic.ac.uk/~ajd/

inverse depth monocular slam - oxford brookes...

Documents