lecture 1 object detection - class

69
Lecture 1 Object Detection Bill Triggs Laboratoire Jean Kuntzmann, Grenoble, France [email protected] International Computer Vision Summer School Sicily July 2008

Upload: others

Post on 23-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 1 Object Detection - CLASS

Lecture 1

Object Detection

Bill TriggsLaboratoire Jean Kuntzmann, Grenoble, France

[email protected]

International Computer Vision Summer SchoolSicily

July 2008

Page 2: Lecture 1 Object Detection - CLASS

What do we need to build a good object detector?

● There may be lighting variations, changes in appearance, complex backgrounds– We need robust visual features

● Instances may have variable geometry or internal degrees of freedom– Orientation, 3D pose, body pose, facial expression

– We need a flexible recognition method

● Instances may occur anywhere in the image and at any scale– We need a good search control strategy...

Page 3: Lecture 1 Object Detection - CLASS

What do we need to build a good object detector?

● There may be overlapping instances or detections– We need a detection postprocessing strategy

● The method is likely to be based on learning and will need to be validated– We need labelled training and validation sets

● Computational cost or embeddability may be an issue– We need to review the whole system for efficiency

Page 4: Lecture 1 Object Detection - CLASS

A Naive Image Scanning Detector – Template Matching

Match window against a rigid template, e.g. by correlation

Scan image at all scales and locations

Object detections

Detection Phase

`Scale-space pyramid

Detection windowReturn above-threshold matches as detections

Page 5: Lecture 1 Object Detection - CLASS

Problems with this approach

• It is photometrically too rigid to resist changes in lighting and appearance variations

• It is geometrically too rigid to resist shape variations• It does not have a strategy for overlapping detections

Page 6: Lecture 1 Object Detection - CLASS

Anatomy of a Modern Object Detector

• Strong image preprocessing and feature normalization for resistance to illumination changes

• Local rectification and pooling for resistance to small shape variations

• Overcomplete feature set for rich description• Machine learning based decision rule to capture

statistics and variability of real application• Postprocessing to fuse multiple detections

Page 7: Lecture 1 Object Detection - CLASS

Image Scanning Detectors

Fuse multiple detections in 3-D position & scale space

Extract features over windows

Scan image(s) at all scales and locations

Object detections with bounding boxes

Detection Phase

`Scale-space pyramid

Detection window

Run window classifier at all locations

Page 8: Lecture 1 Object Detection - CLASS

Image Preprocessing● Preprocessing is often neglected but it can

make a huge difference in performance● One example of a preprocessing chain

input image

strong gamma

compression

centre-surround

filter

robust local contrast

normalization

highlight suppression

Page 9: Lecture 1 Object Detection - CLASS

Performance Improvements from Preprocessing

Face Recognition Grand Challenge 1.0.4 Dataset,various features,baseline LDA classifier

Page 10: Lecture 1 Object Detection - CLASS

Local Binary Pattern Features

● Descriptors based on local thresholding or ranking of pixel or edge intensities are very resistant to illumination changes

● Local Binary Patterns – threshold ring of pixels at value of central pixel

– locally histogram resulting binary codes

– currently one of the best descriptors for face recognition

Page 11: Lecture 1 Object Detection - CLASS

Detectors using Local Filters● Convolution filters inspired by V1 simple cell

responses, multiscale image representations– Gaussian derivatives, Gabor filters, log-polar Gabor

filters, steerable filters, Haar wavelets

– use a number of orientations (4-12)

– output is typically squared or rectified before use

2nd & 3rd order Gaussian derivative, scaled Gaussian derivative and log-polar Gabor filters

2nd order steerable filter and its frequency response

Haar wavelets

Page 12: Lecture 1 Object Detection - CLASS

Training set (2k positive / 10k negative)

Haar wavelet descriptors

Support vector

machine

Multi-scale search

training

Test image

results

testdescriptors

Haar Wavelet / SVM Human Detector

[Papageorgiou & Poggio, 1998]

1326-D descriptor

Page 13: Lecture 1 Object Detection - CLASS

Which Descriptors are Important?

32x32 descriptors (HVX) 16x16 descriptors (HVX)

Mean response difference between positive & negative training examples

Essentially just a coarse-scale human silhouette template!

Page 14: Lecture 1 Object Detection - CLASS

Some Detection Results

Page 15: Lecture 1 Object Detection - CLASS

Detectors using Edge / Gradient Orientation Histograms

● Divide local region into spatial cells● Calculate orientation of image gradient at each pixel● Pool quantized orientations over each cell

– descriptor contains an orientation histogram for each cell– weight votes by gradient magnitude

● Can also use edge orientations from a discrete edge detector

● Basis of the popular SIFT, HOG, Generalized Shape Context methods

orientation voting and pooling into spatial cells

Page 16: Lecture 1 Object Detection - CLASS

C.f. Shape context– pool counts of edge pixels into log-polar spatial bins

– centre descriptor on regularly spaced / all edge pixels

Page 17: Lecture 1 Object Detection - CLASS

Histogram of Oriented Gradient (HOG) Person Detector

● This simple detector is still one of the best generic human detectors

● It is a good illustration of – the power that modern features and training methods

have given to basic template matching

– the need for good engineering and attention to detail

N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. CVPR, 2005

Page 18: Lecture 1 Object Detection - CLASS

Feature Extraction

Compute gradients

Feature vector f = [ ..., ..., ...]

Block

Normalise gamma

Weighted vote in spatial & orientation cells

Contrast normalise over overlapping spatial cells

Collect HOGs over detection window

Input image

Detection window

Linear SVM

Overlap of Blocks

Cell

Page 19: Lecture 1 Object Detection - CLASS

Overview of Learning Phase

Learn binary classifier

Encode images into feature spaces

Create fixed-resolution normalised training image data set

Learning phase

Object/Non-object decision

Learn binary classifier

Encode images into feature spaces

Resample negative training images to create hard examples

Input: Annotations on training images

Re-training reduces false positives by an order of magnitude!

Page 20: Lecture 1 Object Detection - CLASS

HOG DescriptorsParameters Gradient scale Orientation bins Percentage of block

overlapε+← 2

2/ vvv

Schemes RGB or Lab, colour/gray-space Block normalisation

L2-norm,

orL1-norm,

CellBlock

R-H

OG

/SIF

T

Cente

r bin

C-H

OG

)/(1

ε+← vvv

Page 21: Lecture 1 Object Detection - CLASS

Evaluation Data SetsINRIA person databaseMIT pedestrian database

Overall 709 annotations+ reflections

200 positive windows

Negative data unavailable

507 positive windows

Negative data unavailable

566 positive windows

453 negative images

1208 positive windows

1218 negative images

Overall 1774 annotations+ reflections

Tra

inTest

Tra

inTest

Page 22: Lecture 1 Object Detection - CLASS

Performance on MIT Dataset

● R-HOG and C-HOG give near perfect separation on MIT database● Both have 1-2 order lower false positives than wavelets and similar

descriptors

Page 23: Lecture 1 Object Detection - CLASS

Performance on INRIA Database

Page 24: Lecture 1 Object Detection - CLASS

Influence of ParametersGradient smoothing, σ Orientation bins, β

Reducing gradient scale from 3 to 0 decreases false positives by 10 times

Increasing orientation bins from 4 to 9 decreases false positives by 10 times

Page 25: Lecture 1 Object Detection - CLASS

Influence of ParametersNormalisation method Block overlap

● Strong local normalisation is essential

Overlapping blocks improve performance, but descriptor size increases

Page 26: Lecture 1 Object Detection - CLASS

Influence of Block and Cell Size

● Trade off between need for local spatial invariance and need for finer spatial resolution

12

8

64

Page 27: Lecture 1 Object Detection - CLASS

Which Cues are Important?

Input example

Weighted pos wts

Weighted neg wts

Outside-in weights

Most important cues are head, shoulder, leg silhouettes Vertical gradients inside a person are counted as

negative Overlapping blocks just outside the contour are most

important

Average gradients

Page 28: Lecture 1 Object Detection - CLASS

Merging Overlapping Detections

Robust mode detection (mean shift)

Η−−=

−n

i iii

syixii

wf

ss

2//)(exp)(

],)exp(,)[exp(

21xxx

σσσ

x

y s (i

n log

)

Clip Detection Score

Multi-scale dense scan of detection window

Final detections

Threshold

Bias

Page 29: Lecture 1 Object Detection - CLASS

Influence of Mean Shift Kernel

Spatial smoothing aspect ratio as per window shape, smallest sigma approx. equal to stride/cell size

Relatively independent of scale smoothing, sigma equal to 0.4 to 0.7 octaves gives good results

Page 30: Lecture 1 Object Detection - CLASS

Influence of Other Parameters

Different mappings Effect of scale-ratio

Hard clipping of SVM scores gives better results than simple probabilistic mapping of the scores

Fine scale sampling improves recall

Page 31: Lecture 1 Object Detection - CLASS

Results Using Static HOGNo temporal smoothing of detections

Page 32: Lecture 1 Object Detection - CLASS

Conclusions for Static HOG Human Detector

● Fine grained features improve performance– Rectify fine gradients then pool spatially

● No gradient smoothing, [1 0 -1] derivative mask● Orientation voting into fine bins● Spatial voting into coarser bins

– Use gradient magnitude (no thresholding)– Strong local normalization– Use overlapping blocks– Robust non-maximum suppression

● Fine scale sampling, hard clipping & anisotropic kernel

Human detection rate of 90% at 10-4 false positives per window

Slower than integral images of Viola & Jones, 2001

Page 33: Lecture 1 Object Detection - CLASS

Applications to Other Classes

M. Everingham et al. The 2005 PASCAL Visual Object Classes Challenge. Proceedings of the PASCAL Challenge Workshop, 2006.

Page 34: Lecture 1 Object Detection - CLASS

Parameter Settings

● Most HOG parameters are stable across different classes

● Parameters that change– Gamma compression– Normalisation methods – Signed/un-signed gradients

Page 35: Lecture 1 Object Detection - CLASS

Results from Pascal VOC 2006

0.160

-

-

-

-

0.151

Cat

0.137

-

0.140

-

-

0.091H

ors

e

0.265

0.153

0.318

0.390

-

0.178

Moto

rbik

e

0.303

-

0.440

0.414

-

0.249

Bic

ycle

0.169

-

-

0.117

-

0.138

Bu

s

0.039

0.074

0.114

0.164

-

0.030

Pers

on

0.227

-

-

0.251

-

0.131

Sh

eep

0.252

-

0.224

0.212

0.159

0.149

Cow

0.113

-

-

-

-

0.118

Dog

0.222TKK

-TUD

-

Laptev=HOG+

Ada-boost

0.444HOG

0.398ENSMP

0.254Cam

bridge

Car

 

HOG outperformed other methods for 4 out of 10 classes Its adaBoost variant outperformed other methods for 2 out of 10 classes

Page 36: Lecture 1 Object Detection - CLASS

Finding People in Videos

● Motivation– Human motion is very characteristic

● Requirements– Must work for moving camera and background– Robust coding of relative motion of human parts

● Method– Use differential flow for resistance to camera motion

– HOG like spatial histogramming for robust coding of relative motion

Page 37: Lecture 1 Object Detection - CLASS

Motion HOG Processing Chain

Collect HOGs for all blocks over detection window

Normalise contrast within overlapping blocks of cells

Accumulate votes for differential flow orientation over spatial cells

Compute optical flow

Normalise gamma & colour

Compute differential flow

Input image Consecutive image

Flow field Magnitude of flow

Differential flow X Differential flow Y

Block

Overlap of Blocks

Cell

Detection windows

Page 38: Lecture 1 Object Detection - CLASS

Overview of Feature Extraction

Collect HOGs over detection window

Object/Non-object decision

Linear SVM

Static HOG Encoding

Motion HOG Encoding

Input image Consecutive image(s)

App

eara

nc

e C

hannel M

otio

n

Channe

l

Test 2

Test 1

Train

Same 5 DVDs, 50 shots

1704 positive windows

5 DVDs, 182 shots

5562 positive windows

6 new DVDs, 128 shots

2700 positive windows

Data Set

Page 39: Lecture 1 Object Detection - CLASS

Motion Boundary Histograms

First frame

Second frame

Estd. flow

Flow mag.

y-flow diff

x-flow diff

Avg. x-flow diff

Avg. y-flow diff

Treat x, y-flow components as independent images

Take their local gradients separately, and compute HOGs as in static images

Flow discontinuities follow occlusion boundaries, so this encodes depth and motion boundaries

Page 40: Lecture 1 Object Detection - CLASS

Internal Motion Histograms

● Alternatively, we can use orientations of flow differences not boundaries

● This captures relative motions of body parts

● We tested several different coding schemes based on finite spatial (inter-part) displacements

Page 41: Lecture 1 Object Detection - CLASS

IMH Encoding Schemes● Simple difference

– Take x, y differentials of flow vector images [Ix, Iy ]

– Variants may use larger spatial displacements while differencing, e.g. [1 0 0 0 -1]

● Center cell difference

+1

+1

+1+1

+1

+1+1

-1

+1

Wavelet-style cell differences

+1

-1

+1

-1

+1 -1

+1

-1

+1

-2

+1

-1

+1 -1

+1

+1 -1

+1

-1

+1-1

-1

+1

+1-2 +1

Page 42: Lecture 1 Object Detection - CLASS

Flow Methods● Proesman’s flow [ Proesmans et al. ECCV 1994]

– 15 seconds per frame● Our flow method

– Multi-scale pyramid based method, no regularization– Brightness constancy based damped least squares solution

on 5X5 window

– 1 second per frame● MPEG-4 based block matching

– Runs in real-time

Input image Proesman’s flow Our multi-scale flow

( ) bAIAA TTT 1],[

−+= βyx

Page 43: Lecture 1 Object Detection - CLASS

Performance Comparison

Only motion information Appearance + motion

With motion only, MBH scheme on Proesmans’ flow works best

Combined with appearance, centre difference IMH performs best

Page 44: Lecture 1 Object Detection - CLASS

Trained on Static & Flow

Tested on flow only Tested on appearance + flow

Adding static images during test reduces performance margin

No deterioration in performance on static images

Page 45: Lecture 1 Object Detection - CLASS

Motion HOG VideoNo temporal smoothing, each pair of frames treated independently

Page 46: Lecture 1 Object Detection - CLASS

AdaBoost Cascade Face Detector● A computationally efficient architecture that rapidly rejects

unpromising windows– A chain of classifiers that each reject some fraction of the negative

training samples while keeping almost all positive ones

● Each classifier is an AdaBoost ensemble of rectangular Harr-like features sampled from a large pool

[Viola & Jones, 2001]

Rectangular Haar features and the first two features chosen by AdaBoost

Page 47: Lecture 1 Object Detection - CLASS

Dynamic Pedestrian DetectionViola, Jones and Snow, ICCV 2003

Similar to the above face detector but also includes motion derivative filters

Page 48: Lecture 1 Object Detection - CLASS

Convolutional Neural Nets● A series of banks of convolution filters that alternately analyse

the output images of the previous bank (“simple cells”) and spatially pool the resulting rectified responses (“complex cells”)

● Trained by gradient descent on large training sets

AT&T system – reads ~10% of U.S. cheques

[Lecun 1992-8]

Page 49: Lecture 1 Object Detection - CLASS

Rotation Invariant Neural Net Face Detector

Learn rectifier network for rotations, then upright face detector

[Rowley et al., 1998]

Page 50: Lecture 1 Object Detection - CLASS

Convolutional Net Multipose Face Detector● Net is trained to produce zero for a non-face, a unit-vector

encoding the facial pose for a face● At run time must run a descent search to find best putative pose

for observed image, then check whether “face” is likely given this

[Osadchy, 2007]

Page 51: Lecture 1 Object Detection - CLASS

Video

Page 52: Lecture 1 Object Detection - CLASS

Exemplar based Pedestrian Detector● Build model by clustering training examples hierarchically ● At run-time, use similarity tree to find similar examples quickly

[D.Gavrila, ICPR'98]

Page 53: Lecture 1 Object Detection - CLASS

Distance Transform based Edge Template Matching

[Gavrila, Philomin, ICCV'99]

For best results, use DT over orientated edges

Page 54: Lecture 1 Object Detection - CLASS

Learning to Detect Object Contoursby Cue Combination

Brightness, colour & texture

gradient, combined with

boosted logistic regression

[Martin et al., PAMI'04]

Page 55: Lecture 1 Object Detection - CLASS

Capturing Local Statistics● Many approaches capture local image content

using statistics or distributions of primitive descriptors over local image regions– e.g. tile local region with small cells, find statistics in

each cell

– captures local context, increases robustness to spatial displacements

● Capture distributions using mixture models, histograms of quantized descriptor values

● Capture statistics using moments, pairwise correlations...

Page 56: Lecture 1 Object Detection - CLASS

Wavelet Histogram Face Detector● Crudely quantize wavelet responses (3-5 levels)● Partition wavelets into groups of 5-8 with strong mutual information● For each group build histogram of log P(object)/P(non-object)● Final classifier is naïve Bayes combination of histogram lookups● Learn frontal and profile face detectors and combine outputs

Wavelets with strong MI with indicated one, and a chosen

coefficient pair

Some detections

Green regions strongly support face, red regions

support non-face[Schneiderman, IJCV'02]

Page 57: Lecture 1 Object Detection - CLASS

Learning Based Feature Detectors● Many kinds of local cues are informative, but

responses are typically strongly correlated

● Naïve Bayes feature combination doesn't work well, but we can learn to combine cues to produce a stronger detector

● e.g. Maximum Entropy learning of distribution, ML of presence decisions

Page 58: Lecture 1 Object Detection - CLASS

Maximum Entropy Learning● Models joint distribution by matching predicted

and empirical 1D projections (e.g. histograms of linear filter responses)

[Siddenbladh & Black]

Page 59: Lecture 1 Object Detection - CLASS

Maximum Entropy Learning

Page 60: Lecture 1 Object Detection - CLASS

Local Descriptor Methods ● Represent image as a set of descriptors over local image

regions (patches)

● Patches contain a lot of information about image content

● Locality reduces interference from

– occlusion & clutter

– lighting variations (local normalization)

– global effects of changes in form or viewpoint

● But it fragments the scene – global form is harder to see

Page 61: Lecture 1 Object Detection - CLASS

“Texton” / “Bag of Features” Image Classification

● Classify images by their distributions of local patch appearances– Sample patches densely, randomly, at salient interest points...– Characterize appearance using any local descriptor (e.g. SIFT)– Characterize descriptor set or distribution by vector quantizing

descriptors against a large dictionary of patches and histogramming results

– Learn classification rules for classes of images using ML over the BoF histograms

● Inter-patch relationships and global image structure are ignored

Page 62: Lecture 1 Object Detection - CLASS

Extremely Randomized Clustering Forests

● Instead of vector quantization, quantize against an ensemble of discriminatively trained random decision trees

● Each leaf of each tree has a separate bin● Then learn linear SVM classifier over these bins● Fast and works very well

Page 63: Lecture 1 Object Detection - CLASS

Object Localization in Bag of Features Models

● BoF models work surprisingly well for content based image classification because certain patches are very characteristic of certain object classes– e.g. this can be seen in the linear SVM weights

● We can use these to approximately localize the object– iterate updating the location mask and using it to remake histogram

Bicycle localization with

randomized forest features

Page 64: Lecture 1 Object Detection - CLASS

Local Feature Based Pedestrian Detector

Combines ● bottom-up local cues

from bag of interest point recognition

● probabilistic top-down segmentation

for good handling of occlusions

(Leibe & Schiele, CVPR'05)

Page 65: Lecture 1 Object Detection - CLASS

Implicit Shape Model - Liebe and Schiele, 2003

BackprojectedHypotheses

Interest Points Matched Codebook Entries

Probabilistic Voting

Voting Space(continuous)

Backprojectionof Maxima

Segmentation

Refined Hypotheses(uniform sampling)

Liebe and Schiele, 2003, 2005

Page 66: Lecture 1 Object Detection - CLASS

Learning Surface Orientation & Type

● Learning based features (detectors) for vertical, horizontal left/right/centre facing and “porous” vs. solid surfaces– logistic AdaBoosted decision trees over a large set of

local cues

Page 67: Lecture 1 Object Detection - CLASS

Using Geometric Context to Aid Detection

● Making sense of city scenes by combining surface orientation cues, object detector responses, horizon estimates

Image

P(object | surfaces, viewpoint)P(object)

P(surfaces) P(viewpoint)

[Hoiem, CVPR'06]

Page 68: Lecture 1 Object Detection - CLASS

Image Parsing● Attempts to synthesize entire scenes

from component models using multilevel MCMC sampling

– faces, letters, background...

[Zhu et al, 2003...]

Page 69: Lecture 1 Object Detection - CLASS

The End