3d vision: structure from motion - cvg @ ethz · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x...

Post on 17-Jul-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

3D Vision: Structure from Motion

Structure from Motion

•  Two view reconstruction •  Epipolar geometry computation •  Triangulation

•  Adding more views •  Pose estimation

Epipolar geometry

The fundamental matrix F algebraic representation of epipolar geometry

we will see that mapping is (singular) correlation (i.e. projective mapping from points to lines) represented by the fundamental matrix F

The fundamental matrix F

geometric derivation

mapping from 2-D to 1-D family (rank 2)

The fundamental matrix F

algebraic derivation

(note: doesn’t work for C=C’ ⇒ F=0)

The fundamental matrix F

correspondence condition

The fundamental matrix satisfies the condition that for any pair of corresponding points x↔x’ in the two images

The fundamental matrix F - recap

F is the unique 3x3 rank 2 matrix that satisfies x’TFx=0 for all x↔x’

(i)   Transpose: if F is fundamental matrix for (P,P’), then FT is fundamental matrix for (P’,P)

(ii)   Epipolar lines: l’=Fx & l=FTx’ (iii)   Epipoles: on all epipolar lines, thus e’TFx=0, ∀x

⇒e’TF=0, similarly Fe=0 (iv)  F has 7 d.o.f. , i.e. 3x3-1(homogeneous)-1(rank2) (v)  F is a correlation, projective mapping from a point x to a

line l’=Fx (not a proper correlation, i.e. not invertible)

Computation of F

•  Linear (8-point) •  Minimal (7-point) •  Calibrated (5-point) (Essential matrix)

•  Practical two-view geometry computation

Epipolar geometry: basic equation

separate known from unknown

(data) (unknowns) (linear)

~10000 ~10000 ~10000 ~10000 ~100 ~100 1 ~100 ~100

! Orders of magnitude difference between column of data matrix → least-squares yields poor results

the NOT normalized 8-point algorithm

Transform image to ~[-1,1]x[-1,1]

(0,0)

(700,500)

(700,0)

(0,500)

(1,-1)

(0,0)

(1,1) (-1,1)

(-1,-1)

the normalized 8-point algorithm

the singularity constraint

SVD from linearly computed F matrix (rank 3)

Compute closest rank-2 approximation

the minimum case – 7 point correspondences

one parameter family of solutions

but F1+λF2 not automatically rank 2

F1 F2 F

σ3

F7pts

(obtain 1 or 3 solutions)

(cubic equation)

the minimum case – impose rank 2

Compute possible λ as eigenvalues of (only real solutions are potential solutions)

•  Linear equations for 5 points

•  Linear solution space

•  Non-linear constraints

Calibrated case: 5-point relative motion

10 cubic polynomials

scale does not matter, choose

(Nister, CVPR03)

!x1x1 !x1y1 !x11 !y1x1 !y1y1 !y1 x1 y1 1!x2x2 !x2y2 !x21 !y2x2 !y2y2 !y2 x2 y2 1!x3x3 !x3y3 !x31 !y3x3 !y3y3 !y3 x3 y3 1!x4x4 !x4y4 !x41 !y4x4 !y4y4 !y4 x4 y4 1!x5x5 !x5y5 !x51 !y5x5 !y5y5 !y5 x5 y5 1

"

#

$$$$$$$

%

&

'''''''

E11E12E13E21E22E23E31E32E33

"

#

$$$$$$$$$$$$$

%

&

'''''''''''''

= 0

(assumes normalized coordinates)

Calibrated case: 5-point relative motion

•  Perform Gauss-Jordan elimination on polynomials

-z

-z

-z

represents polynomial of degree n in z

(Nister, CVPR03)

Step 1. Extract features Step 2. Compute a set of potential matches Step 3. do

Step 3.1 select minimal sample (i.e. 7 or 5 matches) Step 3.2 compute solution(s) for F Step 3.3 determine inliers

until Γ(#inliers,#samples)<95%

#inliers 90% 80% 70% 60% 50%

#samples 5 13 35 106 382

Step 4. Compute F based on all inliers Step 5. Look for additional matches Step 6. Refine F based on all correct matches

(generate hypothesis)

(verify hypothesis)

Automatic computation of F

RANSAC

restrict search range to neighborhood of epipolar line (e.g. ±1.5 pixels) relax disparity restriction (along epipolar line)

Finding more matches

Initial structure and motion Epipolar geometry ↔ Projective calibration

compatible with F

Yields correct projective camera setup (Faugeras´92,Hartley´92)

Obtain structure through triangulation Use reprojection error for minimization Avoid measurements in projective space

Initial structure and motion (calibrated case)

Essential Matrix:

Essential Matrix decomposition

Recover R and t from E

use or use or ambiguity

P1 = I 0!"

#$

P2 = R t!"

#$

(e.g. see Hartley and Zisserman, Sec.8.6)

Triangulation

C1 x1 L1

x2

L2 X

C2

Triangulation -  calibration

-  correspondences

Triangulation •  Backprojection

•  Triangulation

Iterative least-squares •  Maximum Likelihood Triangulation (geometric error)

C1 x1 L1

x2

L2 X

Optimal 3D point in epipolar plane

•  Given an epipolar plane, find best 3D point for (m1,m2)

m1

m2

l1 l2l1 m1

m2 l2

m1´ m2´

Select closest points (m1´,m2´) on epipolar lines Obtain 3D point through exact triangulation Guarantees minimal reprojection error (given this epipolar plane)

Non-iterative optimal solution •  Reconstruct matches in projective frame

by minimizing the reprojection error

•  Non-iterative method Determine the epipolar plane for reconstruction

Reconstruct optimal point from selected epipolar plane Note: only works for two views

(Hartley and Sturm, CVIU´97)

(polynomial of degree 6)

m1

m2 l1(α) l2(α)

3DOF

1DOF

Initialize Motion (P1,P2 compatibel with F or E)

Sequential Structure and Motion Computation

Initialize Structure (minimize reprojection error)

Sequential structure and motion recovery

•  Initialize structure and motion from two views

•  For each additional view •  Determine pose •  Refine and extend structure

•  Determine correspondences robustly by jointly estimating matches and epipolar geometry

Compute Pi+1 using robust approach (6-point RANSAC) Extend and refine reconstruction

2D-2D

2D-3D 2D-3D

mi mi+1

M

new view

Determine pose towards existing structure

Compute P with 6-point RANSAC

•  Generate hypothesis using 6 points

•  Planar scenes are degerate!

(similar DLT algorithm as see in 2nd lecture for homographies)

(two equations per point)

Three points perspective pose – p3p (calibrated case)

(Haralick et al., IJCV94)

All techniques yield 4th order polynomial

1903 1841

Initialize Motion (P1,P2 compatibel with F or E)

Sequential Structure and Motion Computation

Initialize Structure (minimize reprojection error)

Extend motion (compute pose through matches seen in 2 or more previous views)

Extend structure (Initialize new structure, refine existing structure)

Changchang’s SfM code

for iconic graph •  uses 5-point+RANSAC for 2-view initialization •  uses 3-point+RANSAC for adding views •  performs bundle adjustment For additional images •  use 3-point+RANSAC pose estimation

http://ccwu.me/vsfm/

Rome on a cloudless day (Frahm et al. ECCV 2010)

GIST & clustering (1h35)

SIFT & Geometric verification (11h36)

SfM & Bundle (8h35)

Dense Reconstruction (1h58)

Some numbers •  1PC •  2.88M images •  100k clusters •  22k SfM with 307k images •  63k 3D models •  Largest model 5700 images •  Total time 23h53

Hierarchical structure and motion recovery

•  Compute 2-view •  Compute 3-view •  Stitch 3-view reconstructions •  Merge and refine reconstruction

F T

H

PM

Stitching 3-view reconstructions

Different possibilities 1. Align (P2,P3) with (P’1,P’2)

2. Align X,X’ (and C’C’)

3. Minimize reproj. error

4. MLE (merge)

SfM revisited

Soon available at https://github.com/colmap/colmap

Structure-from-Motion revisited, Johannes L. Schönberger, Jan-Michael Frahm IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

Next week: Dense Correspondences

top related