Download - Unsolved Problems in Optical Flow and Stereo Estimation

Unsolved Problems in Optical Flow

and Stereo Estimation

Richard Szeliski

Microsoft Research

and

Daniel Scharstein

Middlebury College

This work was supported in part by NSF grants IIS-0413169 and IIS-0917109

http://research.microsoft.com/c/1040

Outline

Prior work: Middlebury benchmarks

Recent work: handling reflections

What are current challenges?

Future evaluation efforts?

Collaborators - Benchmarks

Steve Seitz, U Washington

Brian Curless, U Washington

James Diebel, Stanford

Simon Baker, Microsoft Research

Michael Black, Brown U

JP Lewis, Weta Digital Ltd

Stefan Roth, TU Darmstadt

Heiko Hirschmüller, DLR Germany

Chris Pal, U Rochester

Collaborators – Middlebury students

Anna Blasiak ’07

Padma Ugbabe ’03

Alexander Vandenberg-Rodes

Jiaxin (Lily) Fu ’03

Sarri Al-Nashashibi ’08

Gonzalo Alonso ’06

Jeff Wehrwein ’08 Brad Hiebert-Treuer

’07

Alan Lim ’09 Nera Nesic ’13

Xi Wang ’14

Goal: Extract information from images (both 2D and 3D)

Hard problem:

Noisy data

Lots of it

Need additional assumptions

Computer Vision

Our focus: image matching

Stereo vision

Multi-view stereo

Image motion / optical flow

Applications - Stereo

Video conferencing

Game control

Intelligent cars

Applications – Multiview stereo

3D reconstruction

3D printing

Applications – Optical flow

Video interpolation and compression

Vehicle and people tracking

Stereo vision

Infer 3D structure from 2 (or more)

images of a scene

Seems easy for humans…

Why is matching hard?

Untextured areas

Noisy data / aliasing

Depth discontinuities

Occlusions

Reflections / specularities

Different camera responses

Imperfect calibration

…

Datasets with ground truth

Ground truth = true answer

(e.g. true disparities)

GT needed for quantitative analysis

of algorithms (benchmarks)

Middlebury benchmarks:

http://vision.middlebury.edu/

http://vision.middlebury.edu/

1. Middlebury Stereo Page

(Scharstein & Szeliski – CVPR 2001, IJCV 2002)

vision.middlebury.edu/stereo

Evaluator with web interface




v.1 by Lily Fu ’03

Left views

GT

disps





v.1 by Lily Fu ’03 v.2 by Anna Blasiak ’07

Left views

GT

disps


Currently 135 entries

2. Multiview Stereo Evaluation

(Seitz, Curless, Diebel, Scharstein, Szeliski – CVPR 2006)

vision.middlebury.edu/mview

Create 3D model from 100s of views

One view

GT

Surface mesh

3. Optical Flow Evaluation

(Baker, Scharstein, Lewis, Roth, Black, Szeliski – ICCV 2007)

vision.middlebury.edu/flow

Input: video sequence

Output: flow vectors Where do pixels move from frame to frame?

How to get ground truth?

1. Stereo – true disparities

2. Multiview stereo – true surface mesh

3. Optical flow – true motion vectors

Setup 2005 / 2006

7 views

3 ambient light setups

3 exposures

2005: 9 datasets 2006: 21 datasets

see vision.middlebury.edu/stereo/data

http://vision.middlebury.edu/stereo/data/

Version 3 – soon?

Current work: new datasets

Specular surfaces

Point-and-shoot cameras

Possibly outdoor scenes

“Space-time stereo” techniques

Stereo video?

Unpublished datasets

Work in progress on specular scenes

Spray paint motorcycle after color photos are acquired to enable active lighting ranging

Mobile acquisition system, 2012

DSLR Cameras

Point & Shoot Cameras

Projector Laptop for Processing

Motorcycle Scene - Original

Motorcycle Scene - Painting

Motorcycle Scene - Painted

Motorcycle Disparity Map

Motorcycle Scene - Original

What can we do about specular scenes?

A1: treat reflections as separate layers

Image-Based Rendering for Scenes with Reflections

Sudipta N. Sinha

Johannes Kopf

Michael Goesele

Daniel Scharstein

Richard Szeliski

Use laser scanner

Merge 100s of scans

Fill holes

Align with image data

2. Multiview stereo: range data

Version 2 – current work

Version 2 – soon?

Have high-quality CT scans

Need better reference views

Need highly accurate camera locations

Include objects from industrial setting

Collaborate with NIST

3. Optical flow: Hidden texture

Can’t use structured light (objects move)

Idea: make pixels “trackable” with

High resolution (downsample by 6)

Hidden fluorescent texture

Very slow motion

Value of benchmarks

Enables quantitative comparison

Summarizes state of the art

Stimulates new research

Challenging data “pushes envelope”

Pitfalls

Overfitting to test data

Focus on ranking

Deemphasizes aspects not evaluated

“Rest” after initial “push”

Solutions

Provide separate training data

Provide diverse datasets

Avoid single ranking

Update benchmarks periodically

Other uses of GT data

Algorithm design

Evaluate algorithm components

Robust data term

Smoothness priors

Machine learning

Evaluation of Cost Functions for Stereo Matching (Hirschmüller & Scharstein, CVPR 2007, PAMI 2009)

Learning Conditional Random Fields for Stereo (Scharstein & Pal, CVPR 2007; Pal et al. IJCV 2010)

Moebius – trained on other 5 Moebius – trained on self

Why is matching hard?

Untextured areas

Noisy data / aliasing

Depth discontinuities

Occlusions

Reflections / specularities

Different camera responses

Imperfect calibration

… what about higher-level semantics?

Semantic scene reconstruction

Conclusion

Benchmarks are important,

stimulate research

Creating ground-truth data is

challenging, fun

Rolling benchmarks

Code archival: source, binaries, and Web services (Web Vision Workshop)

Download - Unsolved Problems in Optical Flow and Stereo Estimation

Top Related