multistage sfm : revisiting incremental structure from motionrajvi.shah/... · rajvi shah1, aditya...

1
Multistage SFM : Revisiting Incremental Structure from Motion Rajvi Shah 1 , Aditya Deshpande 1,2, and P J Narayanan 1 Performance Evaluation Coarse Reconstruction K-cover Model Computation Mean SIFT Computation Direct 3D-2D Localization Find Candidate Image Pairs Guided Feature Matching Triangulation & Merging 2. Add Cameras 3. Add Points 1. Coarse Model Reconstruction Full Reconstruction top η% SIFTs Remaining SIFTs Goal: Pose-estimation of the cameras unlocalized in the coarse reconstruction stage. Cover-set: Compute a subset of 3D points that cover the cameras and compute their mean SIFTs. Create a Kd-tree of image SIFTs and search the mean SIFTs of 3D points in this tree one by one. Pose-estimate the camera using 2D-3D matches. Fast but contains fewer points (Coarse) Most cameras are localized (Global). Stable due to incremental Bundle Adjustment. Point co-visibility among cameras is known. Epipolar geom. of localized cameras is known. Properties of Coarse Model Analysis of reconstructed features by scales in models Extract Features Feature Matching Geometric Verification Match-graph Construction 1. SFM for 2 images 2. Pose-estimate Image 3. Triangulate Features Incremental Structure from Motion (SFM) Traditional Pipeline for Incremental SFM Multistage SFM with embarrassingly parallel point and camera addition stages Sort SIFT features based on scales Pairwise match high-scale η% Features Reconstruct model using robust SFM Li et al. ECCV’10, Sattler et al. ICCV’11, ECCV’12, Chowdhary et al. ECCV’12 Goal: Triangulate the features of localized images. Instead of O(n 2 ) pairwise matching, match each image with only k candidate images (O(nk)), using the co-visible 3D points between images. Use epipolar geometry between the localized cameras for guided matching and form tracks. Merge the tracks using connected-components algorithm and triangulate the points. Epipolar geometry guided search for fast feature matching Shah et al. WACV’15 Ours (η = 20%) Bundler Visual SFM+ PM Dataset Cameras Points Cameras Points Cameras Points Pantheon Int. 538 241K 574 241K 466 52K Pantheon Ext. 780 211K 782 211K 777 117K St. Peters Int. 926 416K 950 416K 901 105K St. Peters Ext. 1126 495K 1154 495K 1138 123K Visual SFM Our Multistage Approach Bundler Dataset With PM 200-cores 8-cores 1-core 1-core Pantheon Int. 19m 26m 69m 6h 48m 1d 12h Pantheon Ext. 110m 60m 97m 12h 43m 6d 15h St. Peters Int. 81m 51m 107m 15h 13m 5d 21h St. Peters Ext. - 121m 181m 1d 8h 12d 2h Input Images http://cvit.iiit.ac.in/projects/multistagesfm/ This work is supported by Google PhD Fellowship and IDH Project of DST. 3D point cloud of CGM/cover-set Mean SIFTs of 3D Points 2D Image Features SIFTs of 2D Feat. Kd-tree of Image Feat. Establish 3D-2D correspondences Pose-estimate the camera 1 CVIT, IIIT Hyderabad, India Camera Addition (Localization) Point Addition (Triangulation) Fraction of connected image pairs at different stages of our pipeline vs. VisualSFM with Preemptive Matching. Postponing the matching of (100 η) % features after coarse modeling allows to, Use co-visibility for selecting fewer candidate images for matching Perform geometry-guided matching that is both faster and produces denser correspondences as compared to zero- knowledge feature matching. Advantages 2 UIUC, Illinois, USA Goal: To break the sequentialiy of incremental SFM for faster reconstruction. Achieved by first reconstructing a coarse model and enriching it in stages. Coarse model is reconstructed quickly using only a few image features. The coarse model is made dense by adding cameras and points in later stages. More cameras are added to the model using direct 3D-2D localization. More points are added to the model using geometry-guided matching. The point and camera addition stages are fast, independent, and parallel. As a result, our method produces denser point clouds in less time. Coarse Global Model (CGM) Colosseum : η = 20% #Cam: 1657, #Pts:967K Pantheon Ext. : η = 20% #Cam: 780, #Pts:211K St. Peters Int. : Coarse Model #Cam: 800, #Pts:54K St. Peters Int. : Full Model #Cam: 889, #Pts:420K 4. Bundle Adjust & Repeat from 2.

Upload: others

Post on 16-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multistage SFM : Revisiting Incremental Structure from Motionrajvi.shah/... · Rajvi Shah1, Aditya Deshpande1,2, and P J Narayanan1 Performance Evaluation Coarse Reconstruction K-cover

Multistage SFM : Revisiting Incremental Structure from Motion

Rajvi Shah1, Aditya Deshpande1,2, and P J Narayanan1

Performance Evaluation

Coarse Reconstruction

K-cover Model Computation

Mean SIFT

Computation

Direct 3D-2D

Localization

Find Candidate Image Pairs

Guided Feature

Matching

Triangulation

& Merging

2. Add Cameras 3. Add Points

1. Coarse Model

Reconstruction Full Reconstruction

top η% SIFTs Remaining SIFTs

Goal: Pose-estimation of the cameras unlocalized

in the coarse reconstruction stage.

Cover-set: Compute a subset of 3D points that

cover the cameras and compute their mean SIFTs.

Create a Kd-tree of image SIFTs and search the

mean SIFTs of 3D points in this tree one by one.

Pose-estimate the camera using 2D-3D matches.

Fast but contains fewer points (Coarse)

Most cameras are localized (Global).

Stable due to incremental Bundle Adjustment.

Point co-visibility among cameras is known.

Epipolar geom. of localized cameras is known.

Properties of Coarse Model

Analysis of reconstructed features by scales in models

Extract Features

Feature Matching

Geometric Verification

Match-graph

Construction

1. SFM for 2 images

2. Pose-estimate Image

3. Triangulate Features

Incremental Structure

from Motion (SFM)

Traditional Pipeline for Incremental SFM Multistage SFM with embarrassingly parallel point and camera addition stages

Sort SIFT features based on scales

Pairwise match high-scale η% Features

Reconstruct model using robust SFM

Li et al. ECCV’10, Sattler et al. ICCV’11, ECCV’12, Chowdhary et al. ECCV’12

Goal: Triangulate the features of localized images.

Instead of O(n2) pairwise matching, match each

image with only k candidate images (O(nk)), using

the co-visible 3D points between images.

Use epipolar geometry between the localized

cameras for guided matching and form tracks.

Merge the tracks using connected-components

algorithm and triangulate the points.

Epipolar geometry guided search for fast feature matching

Shah et al. WACV’15

Ours (η = 20%) Bundler Visual SFM+ PM

Dataset Cameras Points Cameras Points Cameras Points

Pantheon Int. 538 241K 574 241K 466 52K

Pantheon Ext. 780 211K 782 211K 777 117K

St. Peters Int. 926 416K 950 416K 901 105K

St. Peters Ext. 1126 495K 1154 495K 1138 123K

Visual SFM Our Multistage Approach Bundler

Dataset With PM 200-cores 8-cores 1-core 1-core

Pantheon Int. 19m 26m 69m 6h 48m 1d 12h

Pantheon Ext. 110m 60m 97m 12h 43m 6d 15h

St. Peters Int. 81m 51m 107m 15h 13m 5d 21h

St. Peters Ext. - 121m 181m 1d 8h 12d 2h

Input Images

http://cvit.iiit.ac.in/projects/multistagesfm/ This work is supported by Google PhD Fellowship and IDH Project of DST.

3D point cloud of

CGM/cover-set

Mean SIFTs

of 3D Points 2D

Image Featu

res

SIFTs of 2D Feat. Kd-tree of Image Feat.

Establish 3D-2D correspondences Pose-estimate the camera

1CVIT, IIIT Hyderabad, India

Camera Addition (Localization)

Point Addition (Triangulation)

Fraction of connected image pairs at

different stages of our pipeline vs.

VisualSFM with Preemptive Matching.

Postponing the matching of (100 – η) %

features after coarse modeling allows to,

Use co-visibility for selecting fewer

candidate images for matching

Perform geometry-guided matching

that is both faster and produces denser

correspondences as compared to zero-

knowledge feature matching.

Advantages

2UIUC, Illinois, USA

Goal: To break the sequentialiy of incremental SFM for faster reconstruction.

Achieved by first reconstructing a coarse model and enriching it in stages.

Coarse model is reconstructed quickly using only a few image features.

The coarse model is made dense by adding cameras and points in later stages.

More cameras are added to the model using direct 3D-2D localization.

More points are added to the model using geometry-guided matching.

The point and camera addition stages are fast, independent, and parallel.

As a result, our method produces denser point clouds in less time.

Coarse Global Model (CGM)

Colosseum : η = 20%

#Cam: 1657, #Pts:967K

Pantheon Ext. : η = 20%

#Cam: 780, #Pts:211K

St. Peters Int. : Coarse Model

#Cam: 800, #Pts:54K

St. Peters Int. : Full Model

#Cam: 889, #Pts:420K

4. Bundle Adjust &

Repeat from 2.