3D Vision: Structure from Motion
Structure from Motion
• Two view reconstruction • Epipolar geometry computation • Triangulation
• Adding more views • Pose estimation
Epipolar geometry
The fundamental matrix F algebraic representation of epipolar geometry
we will see that mapping is (singular) correlation (i.e. projective mapping from points to lines) represented by the fundamental matrix F
The fundamental matrix F
geometric derivation
mapping from 2-D to 1-D family (rank 2)
The fundamental matrix F
algebraic derivation
(note: doesn’t work for C=C’ ⇒ F=0)
The fundamental matrix F
correspondence condition
The fundamental matrix satisfies the condition that for any pair of corresponding points x↔x’ in the two images
The fundamental matrix F - recap
F is the unique 3x3 rank 2 matrix that satisfies x’TFx=0 for all x↔x’
(i) Transpose: if F is fundamental matrix for (P,P’), then FT is fundamental matrix for (P’,P)
(ii) Epipolar lines: l’=Fx & l=FTx’ (iii) Epipoles: on all epipolar lines, thus e’TFx=0, ∀x
⇒e’TF=0, similarly Fe=0 (iv) F has 7 d.o.f. , i.e. 3x3-1(homogeneous)-1(rank2) (v) F is a correlation, projective mapping from a point x to a
line l’=Fx (not a proper correlation, i.e. not invertible)
Computation of F
• Linear (8-point) • Minimal (7-point) • Calibrated (5-point) (Essential matrix)
• Practical two-view geometry computation
Epipolar geometry: basic equation
separate known from unknown
(data) (unknowns) (linear)
~10000 ~10000 ~10000 ~10000 ~100 ~100 1 ~100 ~100
! Orders of magnitude difference between column of data matrix → least-squares yields poor results
the NOT normalized 8-point algorithm
Transform image to ~[-1,1]x[-1,1]
(0,0)
(700,500)
(700,0)
(0,500)
(1,-1)
(0,0)
(1,1) (-1,1)
(-1,-1)
the normalized 8-point algorithm
the singularity constraint
SVD from linearly computed F matrix (rank 3)
Compute closest rank-2 approximation
the minimum case – 7 point correspondences
one parameter family of solutions
but F1+λF2 not automatically rank 2
F1 F2 F
σ3
F7pts
(obtain 1 or 3 solutions)
(cubic equation)
the minimum case – impose rank 2
Compute possible λ as eigenvalues of (only real solutions are potential solutions)
• Linear equations for 5 points
• Linear solution space
• Non-linear constraints
Calibrated case: 5-point relative motion
10 cubic polynomials
scale does not matter, choose
(Nister, CVPR03)
!x1x1 !x1y1 !x11 !y1x1 !y1y1 !y1 x1 y1 1!x2x2 !x2y2 !x21 !y2x2 !y2y2 !y2 x2 y2 1!x3x3 !x3y3 !x31 !y3x3 !y3y3 !y3 x3 y3 1!x4x4 !x4y4 !x41 !y4x4 !y4y4 !y4 x4 y4 1!x5x5 !x5y5 !x51 !y5x5 !y5y5 !y5 x5 y5 1
"
#
$$$$$$$
%
&
'''''''
E11E12E13E21E22E23E31E32E33
"
#
$$$$$$$$$$$$$
%
&
'''''''''''''
= 0
(assumes normalized coordinates)
Calibrated case: 5-point relative motion
• Perform Gauss-Jordan elimination on polynomials
-z
-z
-z
represents polynomial of degree n in z
(Nister, CVPR03)
Step 1. Extract features Step 2. Compute a set of potential matches Step 3. do
Step 3.1 select minimal sample (i.e. 7 or 5 matches) Step 3.2 compute solution(s) for F Step 3.3 determine inliers
until Γ(#inliers,#samples)<95%
#inliers 90% 80% 70% 60% 50%
#samples 5 13 35 106 382
Step 4. Compute F based on all inliers Step 5. Look for additional matches Step 6. Refine F based on all correct matches
(generate hypothesis)
(verify hypothesis)
Automatic computation of F
RANSAC
restrict search range to neighborhood of epipolar line (e.g. ±1.5 pixels) relax disparity restriction (along epipolar line)
Finding more matches
Initial structure and motion Epipolar geometry ↔ Projective calibration
compatible with F
Yields correct projective camera setup (Faugeras´92,Hartley´92)
Obtain structure through triangulation Use reprojection error for minimization Avoid measurements in projective space
Initial structure and motion (calibrated case)
Essential Matrix:
Essential Matrix decomposition
Recover R and t from E
use or use or ambiguity
P1 = I 0!"
#$
P2 = R t!"
#$
(e.g. see Hartley and Zisserman, Sec.8.6)
Triangulation
C1 x1 L1
x2
L2 X
C2
Triangulation - calibration
- correspondences
Triangulation • Backprojection
• Triangulation
Iterative least-squares • Maximum Likelihood Triangulation (geometric error)
C1 x1 L1
x2
L2 X
Optimal 3D point in epipolar plane
• Given an epipolar plane, find best 3D point for (m1,m2)
m1
m2
l1 l2l1 m1
m2 l2
m1´ m2´
Select closest points (m1´,m2´) on epipolar lines Obtain 3D point through exact triangulation Guarantees minimal reprojection error (given this epipolar plane)
Non-iterative optimal solution • Reconstruct matches in projective frame
by minimizing the reprojection error
• Non-iterative method Determine the epipolar plane for reconstruction
Reconstruct optimal point from selected epipolar plane Note: only works for two views
(Hartley and Sturm, CVIU´97)
(polynomial of degree 6)
m1
m2 l1(α) l2(α)
3DOF
1DOF
Initialize Motion (P1,P2 compatibel with F or E)
Sequential Structure and Motion Computation
Initialize Structure (minimize reprojection error)
Sequential structure and motion recovery
• Initialize structure and motion from two views
• For each additional view • Determine pose • Refine and extend structure
• Determine correspondences robustly by jointly estimating matches and epipolar geometry
Compute Pi+1 using robust approach (6-point RANSAC) Extend and refine reconstruction
2D-2D
2D-3D 2D-3D
mi mi+1
M
new view
Determine pose towards existing structure
Compute P with 6-point RANSAC
• Generate hypothesis using 6 points
• Planar scenes are degerate!
(similar DLT algorithm as see in 2nd lecture for homographies)
(two equations per point)
Three points perspective pose – p3p (calibrated case)
(Haralick et al., IJCV94)
All techniques yield 4th order polynomial
1903 1841
Initialize Motion (P1,P2 compatibel with F or E)
Sequential Structure and Motion Computation
Initialize Structure (minimize reprojection error)
Extend motion (compute pose through matches seen in 2 or more previous views)
Extend structure (Initialize new structure, refine existing structure)
Changchang’s SfM code
for iconic graph • uses 5-point+RANSAC for 2-view initialization • uses 3-point+RANSAC for adding views • performs bundle adjustment For additional images • use 3-point+RANSAC pose estimation
http://ccwu.me/vsfm/
Rome on a cloudless day (Frahm et al. ECCV 2010)
GIST & clustering (1h35)
SIFT & Geometric verification (11h36)
SfM & Bundle (8h35)
Dense Reconstruction (1h58)
Some numbers • 1PC • 2.88M images • 100k clusters • 22k SfM with 307k images • 63k 3D models • Largest model 5700 images • Total time 23h53
Hierarchical structure and motion recovery
• Compute 2-view • Compute 3-view • Stitch 3-view reconstructions • Merge and refine reconstruction
F T
H
PM
Stitching 3-view reconstructions
Different possibilities 1. Align (P2,P3) with (P’1,P’2)
2. Align X,X’ (and C’C’)
3. Minimize reproj. error
4. MLE (merge)
SfM revisited
Soon available at https://github.com/colmap/colmap
Structure-from-Motion revisited, Johannes L. Schönberger, Jan-Michael Frahm IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
Next week: Dense Correspondences