visual object recognition

33
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008 Kristen Grauman Department of Computer Sciences University of Texas in Austin

Upload: fern

Post on 22-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Visual Object Recognition. Bastian Leibe & Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008. Kristen Grauman Department of Computer Sciences University of Texas in Austin. Detection via classification: Main idea. Consider all subwindows in an image - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Visual Object Recognition

Bastian Leibe &

Computer Vision LaboratoryETH Zurich

Chicago, 14.07.2008

Kristen Grauman

Department of Computer SciencesUniversity of Texas in Austin

Page 2: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

2K. Grauman, B. Leibe

Detection via classification: Main idea

• Consider all subwindows in an image Sample at multiple scales and positions

• Make a decision per window: “Does this contain object category X or not?”

• In this section, we’ll focus specifically on methods using a global representation (i.e., not part-based, not local features).

Page 3: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Feature extraction: global appearance

Feature extraction

Simple holistic descriptions of image content

grayscale / color histogram vector of pixel intensities

K. Grauman, B. Leibe

Page 4: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Feature extraction: global appearance

• Pixel-based representations sensitive to small shifts

• Color or grayscale-based appearance description can be sensitive to illumination and intra-class appearance variation

K. Grauman, B. Leibe

Cartoon example: an albino koala

Page 5: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Gradient-based representations

• Consider edges, contours, and (oriented) intensity gradients

K. Grauman, B. Leibe

Page 6: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Gradient-based representations:Rectangular features

Compute differences between sums of pixels in rectangles

Captures contrast in adjacent spatial regions

Similar to Haar wavelets, efficient to computeViola & Jones, CVPR 2001

K. Grauman, B. Leibe

Page 7: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

• How to compute a decision for each subwindow?

Image feature

Classifier construction

K. Grauman, B. Leibe

Page 8: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Boosting

• Build a strong classifier by combining number of “weak classifiers”, which need only be better than chance

• Sequential learning process: at each iteration, add a weak classifier

• Flexible to choice of weak learner including fast simple classifiers that alone may be

inaccurate

• We’ll look at Freund & Schapire’s AdaBoost algorithm

Easy to implement Base learning algorithm for Viola-Jones face detector

8K. Grauman, B. Leibe

Page 9: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

AdaBoost: Intuition

9K. Grauman, B. Leibe

Figure adapted from Freund and Schapire

Consider a 2-d feature space with positive and negative examples.

Each weak classifier splits the training examples with at least 50% accuracy.

Examples misclassified by a previous weak learner are given more emphasis at future rounds.

Page 10: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

AdaBoost: Intuition

10K. Grauman, B. Leibe

Page 11: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

AdaBoost: Intuition

11K. Grauman, B. Leibe

Final classifier is combination of the weak classifiers

Page 12: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

AdaBoost AlgorithmStart with uniform weights on training examples

Evaluate weighted error for each feature, pick best.Incorrectly classified -> more weight

Correctly classified -> less weight

Final classifier is combination of the weak ones, weighted according to error they had.

Freund & Schapire 1995

{x1,…xn}

Page 13: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Cascading classifiers for detection

For efficiency, apply less accurate but faster classifiers first to immediately discard windows that clearly appear to be negative; e.g.,

Filter for promising regions with an initial inexpensive classifier

Build a chain of classifiers, choosing cheap ones with low false negative rates early in the chain

13K. Grauman, B. Leibe

Fleuret & Geman, IJCV 2001Rowley et al., PAMI 1998Viola & Jones, CVPR 2001

Figure from Viola & Jones CVPR 2001

Page 14: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Example: Face detection

• Frontal faces are a good example of a class where global appearance models + a sliding window detection approach fit well:

Regular 2D structure Center of face almost shaped like a

“patch”/window

• Now we’ll take AdaBoost and see how the Viola-Jones face detector works

14K. Grauman, B. Leibe

Page 15: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Feature extraction

15K. Grauman, B. Leibe

Feature output is difference between adjacent regions

Viola & Jones, CVPR 2001

Efficiently computable with integral image: any sum can be computed in constant time

Avoid scaling images scale features directly for same cost

“Rectangular” filters

Value at (x,y) is sum of pixels above and to the left of (x,y)

Integral image

Page 16: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Large library of filters

Considering all possible filter parameters: position, scale, and type:

180,000+ possible features associated with each 24 x 24 window

Use AdaBoost both to select the informative features and to form the classifier

Viola & Jones, CVPR 2001

Page 17: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Viola-Jones Face Detector: Summary

• Train with 5K positives, 350M negatives• Real-time detector using 38 layer cascade• 6061 features in final layer• [Implementation available in OpenCV:

http://www.intel.com/technology/computing/opencv/]17

K. Grauman, B. Leibe

Faces

Non-faces

Train cascade of classifiers with

AdaBoost

Selected features, thresholds, and

weights

New image

Apply to

each

subw

indow

Page 18: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Viola-Jones Face Detector: Results

18K. Grauman, B. Leibe

First two features selected

Page 19: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Viola-Jones Face Detector: Results

Page 20: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Viola-Jones Face Detector: Results

Page 21: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Viola-Jones Face Detector: Results

Page 22: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Profile Features

Detecting profile faces requires training separate detector with profile examples.

Page 23: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Paul Viola, ICCV tutorial

Viola-Jones Face Detector: Results

Page 24: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

24K. Grauman, B. Leibe

Everingham, M., Sivic, J. and Zisserman, A."Hello! My name is... Buffy" - Automatic naming of characters in TV video,BMVC 2006. http://www.robots.ox.ac.uk/~vgg/research/nface/index.html

Example application

Frontal faces detected and then tracked, character names inferred with alignment of script and subtitles.

Page 25: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Highlights

• Sliding window detection and global appearance descriptors:

Simple detection protocol to implement Good feature choices critical Past successes for certain classes

25K. Grauman, B. Leibe

Page 26: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Limitations

• High computational complexity For example: 250,000 locations x 30 orientations x 4

scales = 30,000,000 evaluations! If training binary detectors independently, means

cost increases linearly with number of classes

• With so many windows, false positive rate better be low

26K. Grauman, B. Leibe

Page 27: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Limitations (continued)

• Not all objects are “box” shaped

27K. Grauman, B. Leibe

Page 28: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Limitations (continued)

• Non-rigid, deformable objects not captured well with representations assuming a fixed 2d structure; or must assume fixed viewpoint

• Objects with less-regular textures not captured well with holistic appearance-based descriptions

28K. Grauman, B. Leibe

Page 29: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Limitations (continued)

• If considering windows in isolation, context is lost

29K. Grauman, B. LeibeFigure credit: Derek Hoiem

Sliding window Detector’s view

Page 30: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Limitations (continued)

• In practice, often entails large, cropped training set (expensive)

• Requiring good match to a global appearance description can lead to sensitivity to partial occlusions

30K. Grauman, B. LeibeImage credit: Adam, Rivlin, & Shimshoni

Page 31: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Gradient-based representations

• Consider edges, contours, and (oriented) intensity gradients

• Summarize local distribution of gradients with histogram

Locally orderless: offers invariance to small shifts and rotations

Contrast-normalization: try to correct for variable illumination

K. Grauman, B. Leibe

Page 32: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Gradient-based representations:Histograms of oriented gradients (HoG)

Dalal & Triggs, CVPR 2005

Map each grid cell in the input window to a histogram counting the gradients per orientation.

Code available: http://pascal.inrialpes.fr/soft/olt/

K. Grauman, B. Leibe

Page 33: Visual Object Recognition

Perc

ep

tual a

nd

Sen

sory

Au

gm

en

ted

Com

pu

ting

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Vis

ual O

bje

ct R

ecog

nitio

n T

uto

rial

Pedestrian detection• Detecting upright, walking humans also possible

using sliding window’s appearance/texture; e.g.,

K. Grauman, B. Leibe

SVM with Haar wavelets [Papageorgiou & Poggio, IJCV 2000]

Space-time rectangle features [Viola, Jones & Snow, ICCV 2003]

SVM with HoGs [Dalal & Triggs, CVPR 2005]