16-721: learning-based methods in vision staff: instructor: alexei (alyosha) efros (efros@cs), 4207...

16-721: Learning-based Methods in Vision

Staff:• Instructor: Alexei (Alyosha) Efros

(efros@cs), 4207 NSH• TA: Tomasz Malisiewicz

(tomasz@cmu), Smith Hall 236

Web Page:• http://www.cs.cmu.edu/~efros/courses/

LBMV09/

mailto:seitz@cs

Today

Introduction

Why This Course?

Administrative stuff

Overview of the course

A bit about me

Alexei (Alyosha) Efros

Relatively new faculty (RI/CSD)

Ph.D 2003, from UC Berkeley (signed by Arnie!)

Research Fellow, University of Oxford, ’03-’04

TeachingThe plan is to have fun and learn cool things, both you and me!

Social warning: I don’t see well

Research

Vision, Graphics, Data-driven “stuff”

PhD Thesis on Texture and Action Synthesis

Antonio Criminisi’s son cannot walk but he can fly

Smart Erase button in Microsoft Digital Image Pro:

Why this class?

The Old Days™:

1. Graduate Computer Vision

2. Advanced Machine Perception

Why this class?

The New and Improved Days:

1. Graduate Computer Vision

2. Advanced Machine Perception• Physics-based Methods in Vision• Geometry-based Methods in Vision• Learning-based Methods in Vision

Describing Visual Scenes using Transformed Dirichlet Processes. E. Sudderth, A. Torralba, W. Freeman, and A. Willsky. NIPS, Dec. 2005.

The Hip & Trendy Learning

Learning as Last Resort

Learning as Last Resort

from [Sinha and Adelson 1993]

EXAMPLE: Recovering 3D geometry from

single 2D projection

Infinite number of possible solutions!

Learning-based Methods in Vision

This class is about trying to solve problems that do not have a solution! • Don’t tell your mathematician frineds!

This will be done using Data:• E.g. what happened before is likely to happen again• Google Intelligence (GI): The AI for the post-modern world!• Note: this is not quite statistics

Why is this even worthwhile?• Even a decade ago at ICCV99 Faugeras claimed it wasn’t!

The Vision Story Begins…

“What does it mean, to see? The plain man's answer (and Aristotle's, too). would be, to know what is where by looking.”

-- David Marr, Vision (1982)

Vision: a split personality“What does it mean, to see? The plain man's answer (and

Aristotle's, too). would be, to know what is where by looking. In other words, vision is the process of discovering from images what is present in the world, and where it is.”

Answer #1: pixel of brightness 243 at position (124,54)

…and depth .7 meters

Answer #2: looks like bottom edge of whiteboard showing at the top of the image

Which do we want?

Is the difference just a matter of scale?

depth map

Measurement vs. Perception

Brightness: Measurement vs. Perception

Brightness: Measurement vs. Perception

Proof!

Lengths: Measurement vs. Perception

http://www.michaelbach.de/ot/sze_muelue/index.html

Müller-Lyer Illusion

http://www.michaelbach.de/ot/sze_muelue/index.html

Vision as Measurement Device

Real-time stereo on Mars

Structure from Motion

Physics-based Vision

Virtualized Reality

…but why do Learning for Vision?“What if I don’t care about this wishy-washy human

perception stuff? I just want to make my robot go!”

Small Reason: • For measurement, other sensors are often better (in DARPA

Grand Challenge, vision was barely used!)• For navigation, you still need to learn!

Big Reason:

The goals of computer vision (what + where) are in terms of what humans care about.

So what do humans care about?

slide by Fei Fei, Fergus & Torralba

http://people.w3.org/rishida/photos/html/slides/0311-beijing1_031111_035240+8_beijing_e031124.jpg.html

Verification: is that a bus?


http://people.w3.org/rishida/photos/html/slides/0311-beijing1_031111_035240+8_beijing_e031124.jpg.html

Detection: are there cars?


Identification: is that a picture of Mao?


Object categorization

sky

building

flag

wallbanner

bus

cars

bus

face

street lamp


Scene and context categorization• outdoor

• city

• traffic

• …


Rough 3D layout, depth ordering


Challenges 1: view point variation

Michelangelo 1475-1564 slide by Fei Fei, Fergus & Torralba

Challenges 2: illumination

slide credit: S. Ullman

Challenges 3: occlusion

Magritte, 1957 slide by Fei Fei, Fergus & Torralba

Challenges 4: scale


Challenges 5: deformation

Xu, Beihong 1943slide by Fei Fei, Fergus & Torralba

Challenges 6: background clutter

Klimt, 1913 slide by Fei Fei, Fergus & Torralba

Challenges 7: object intra-class variation

slide by Fei-Fei, Fergus & Torralba

Challenges 8: local ambiguity

slide by Fei-Fei, Fergus & Torralba

Challenges 9: the world behind the image

In this course, we will:

Take a few baby steps…

Goals

Read some interesting papers together• Learn something new: both you and me!

Get up to speed on big chunk of vision research• understand 70% of CVPR papers!

Use learninig-based vision in your own work

Try your hand in a large vision project

Learn how to speakLearn how think critically about papers

Course Organization

Requirements:1. Class Participation (33%)

• Keep annotated bibliography• Post on the Class Blog before each class • Ask questions / debate / flight / be involved!

2. Two Projects (66%)• Analysis Project

• Implement and Evaluate paper and present it in class• Must talk to me AT LEAST 2 weeks beforehand!

• Synthesis Project• Can be done solo or in groups of 2• Regular meetings• Must use lots of data

Class ParticipationKeep annotated bibliography of papers you read (always

a good idea!). The format is up to you. At least, it needs to have:• Summary of key points• A few Interesting insights, “aha moments”, keen observations,

etc.• Weaknesses of approach. Unanswered questions. Areas of

further investigation, improvement.

Before each class:• Submit your summary for current paper(s) in

hard copy (printout/xerox)• Submit a comment on the Class Blog

• ask a question, answer a question, post your thoughts,praise, criticism, start a discussion, etc.

Analysis Project1. Pick a paper / set of papers from the list2. Understand it as if you were the author

• Re-implement it• If there is code, understand the code completely• Run it on data the same data (you can contact authors for data and

even code sometimes)

3. Understand it better than the author• Run it on LOTS of new data (e.g. LabelMe dataset, Flickr dataset,

etc, etc)• Figure out how it succeeds, how it fails, where it fails, and, most

importantly WHY it fails• Look at which parts of the code do the real work, and which parts

are just window-dressing• Maybe suggest directions for improvement.

4. Prepare an amazing 1hr presentation• Discuss with me twice – once when you start the project, 3 days

before the presentation

Synthesis Project

Can grow out of analysis project, or your own research

But it needs to use large amounts of data!

1-2 people per project.

Project proposals in a few weeks.

Project presentations at the end of semester.

Results presented as a CVPR-format paper.

Hopefully, a few papers may be submitted to conferences.

End of Semester Awards

We will vote for:• Best Analysis Project• Best Synthesis Project• Best Blog Comment

Prize: dinner in a French restaurant in Paris (transportation not included!) or some other worthy prizes

16-721: learning-based methods in vision staff: instructor: alexei (alyosha) efros (efros@cs), 4207...

Documents

vision slide

course slide

research vision

vision staff

vision story

vision learningbased

vision geometrybased

graduate computer vision