iccv 2005 beijing, short course, oct 15 · history of ideas in recognition • 1960s –early...
TRANSCRIPT
![Page 1: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/1.jpg)
![Page 2: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/2.jpg)
Coffer Illusion
![Page 3: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/3.jpg)
Coffer Illusion
![Page 4: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/4.jpg)
![Page 5: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/5.jpg)
Supervised learning
f(x) = y
Training: Given a training set of labeled examples:
{(x1,y1), …, (xN,yN)}
Estimate the prediction function f by minimizing the prediction error on the training set.
Testing: Apply f to a unseen test example x and output the predicted value y = f(x) to classify x.
Output (label)Prediction
function
Image
feature
Slide credit: L. Lazebnik
![Page 6: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/6.jpg)
Image Categorization
Training Labels
Training
Images
Classifier Training
Training
Image Features
Trained Classifier
![Page 7: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/7.jpg)
Classifiers
Training Labels
Training
Images
Classifier Training
Training
Image Features
Trained Classifier
![Page 8: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/8.jpg)
Learning a classifier
Given a set of features with corresponding labels, learn a function to predict the labels from the features.
+ +
++
+
+
+
+
oo
o
o
o
x2
x1
+ = Data point from class 1
o = Data point from class 2
Each data point has a feature vector (x1,x2).
![Page 9: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/9.jpg)
Image Categorization
Training Labels
Training
Images
Classifier Training
Training
Image Features
Image Features
Testing
Test Image
Trained Classifier
Trained Classifier Outdoor
Prediction
![Page 10: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/10.jpg)
Example: Scene Categorization
• Is this a kitchen?
![Page 11: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/11.jpg)
Bias-Variance Trade-off
Models with too few parameters are inaccurate because of a large bias.
• Not enough flexibility!
Models with too many parameters are inaccurate because of a large variance.
• Too much sensitivity to the sample.
Bias: error in model assumptions; how much the average
model over all training sets differs from the true model.
Variance: how much models estimated from different training
sets differ from each other.
![Page 12: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/12.jpg)
Recognition: Overview and History
Slides from James Hays, Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce
![Page 13: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/13.jpg)
How many visual object categories are there?
Biederman 1987
![Page 14: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/14.jpg)
![Page 15: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/15.jpg)
OBJECTS
ANIMALS INANIMATEPLANTS
MAN-MADENATURALVERTEBRATE…..
MAMMALS BIRDS
GROUSEBOARTAPIR CAMERA
![Page 16: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/16.jpg)
Specific recognition tasks
Svetlana Lazebnik
![Page 17: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/17.jpg)
Scene categorization or classification
• outdoor/indoor
• city/forest/factory/etc.
Svetlana Lazebnik
![Page 18: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/18.jpg)
Image annotation / tagging / attributes
• street
• people
• building
• mountain
• tourism
• cloudy
• brick
• …
Svetlana Lazebnik
![Page 19: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/19.jpg)
Image parsing / semantic segmentation
mountain
building
tree
banner
market
people
street lamp
sky
building
Svetlana Lazebnik
![Page 20: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/20.jpg)
Object detection
• find pedestrians
Svetlana Lazebnik
![Page 21: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/21.jpg)
Scene understanding?
Svetlana Lazebnik
![Page 22: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/22.jpg)
Category vs. instance recognition
Category:– Find all the people
– Find all the buildings
– Often within a single image
– Often ‘sliding window’
Instance:– Is this face James?
– Find this specific famous building
– Often within a database of images
![Page 23: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/23.jpg)
Scene recognition dataset
Instance or category?
![Page 24: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/24.jpg)
Variability: Camera position
Recognition is all about modeling variability
Svetlana Lazebnik
![Page 25: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/25.jpg)
Variability: Camera position
Illumination
Recognition is all about modeling variability
Svetlana Lazebnik
![Page 26: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/26.jpg)
Variability: Camera position
Illumination
Pose/shape parameters
Recognition is all about modeling variability
Svetlana Lazebnik
![Page 27: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/27.jpg)
Variability: Camera position
Illumination
Pose/shape parameters
Within-class variations?
Recognition is all about modeling variability
Svetlana Lazebnik
![Page 28: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/28.jpg)
Within-class variations
Svetlana Lazebnik
![Page 29: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/29.jpg)
Variability: Camera position
Illumination
Pose/shape parameters
Within-class variation
Recognition is all about modeling variability
Svetlana Lazebnik
High-dimensional space
![Page 30: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/30.jpg)
History of ideas in recognition
• 1960s – early 1990s: the geometric era
Svetlana Lazebnik
No digital cameras!
Slow compute!
![Page 31: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/31.jpg)
Variability: Camera position
Illumination
q
Roberts (1965); Lowe (1987); Faugeras & Hebert (1986); Grimson & Lozano-Perez (1986); Huttenlocher & Ullman (1987)
Shape is known
Svetlana Lazebnik
![Page 32: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/32.jpg)
Alignment
• Alignment: fitting a model to a transformation
between pairs of features (matches) in two images
i
ii xxT )),((residual
Find transformation T
that minimizesT
xixi
'
Svetlana Lazebnik
![Page 33: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/33.jpg)
Recognition as an alignment problem:
Block world
J. Mundy, Object Recognition in the Geometric Era: a Retrospective, 2006
L. G. Roberts Machine Perception of Three Dimensional Solids,Ph.D. thesis, MIT Department of Electrical Engineering, 1963.
![Page 34: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/34.jpg)
ACRONYM (Brooks and Binford, 1981)
Representing and recognizing object categories is harder...
Binford (1971), Nevatia & Binford (1972), Marr & Nishihara (1978)
![Page 35: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/35.jpg)
Zisserman et al. (1995)
Generalized cylinders
Ponce et al. (1989)
Forsyth (2000)
General shape primitives?
Svetlana Lazebnik
![Page 36: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/36.jpg)
Recognition by components
Primitives (geons) Objects
http://en.wikipedia.org/wiki/Recognition_by_Components_Theory
Biederman (1987)
Svetlana Lazebnik
![Page 37: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/37.jpg)
History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
Svetlana Lazebnik
No digital cameras!
Slow compute!
Slow compute!
![Page 38: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/38.jpg)
Empirical models of image variability
Appearance-based techniques
Turk & Pentland (1991); Murase & Nayar (1995); etc.
Svetlana Lazebnik
Known
target
image
![Page 39: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/39.jpg)
Eigenfaces (Turk & Pentland, 1991)
Svetlana Lazebnik
![Page 40: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/40.jpg)
Color Histograms
Swain and Ballard, Color Indexing, IJCV 1991.Svetlana Lazebnik
![Page 41: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/41.jpg)
History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• 1990s – present: sliding window approaches
Svetlana Lazebnik
No digital cameras!
Slow compute!
Slow compute!
![Page 42: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/42.jpg)
Sliding window approaches
![Page 43: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/43.jpg)
Sliding window approaches
• Turk and Pentland, 1991
• Belhumeur, Hespanha, & Kriegman, 1997
• Schneiderman & Kanade 2004
• Viola and Jones, 2000
• Schneiderman & Kanade, 2004
• Argawal and Roth, 2002
• Poggio et al. 1993
![Page 44: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/44.jpg)
History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• Mid-1990s: sliding window approaches
• Late 1990s: local features
Svetlana Lazebnik
No digital cameras!
Slow compute!
Slow compute!
![Page 45: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/45.jpg)
Variability: Camera position
Illumination
q
Roberts (1965); Lowe (1987); Faugeras & Hebert (1986); Grimson & Lozano-Perez (1986); Huttenlocher & Ullman (1987)
Shape is partially known
Svetlana Lazebnik
![Page 46: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/46.jpg)
Local features for object
instance recognition
D. Lowe (1999, 2004)
![Page 47: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/47.jpg)
Large-scale image searchCombining local features, indexing, and spatial constraints
Philbin et al. ‘07
![Page 48: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/48.jpg)
Large-scale image searchCombining local features, indexing, and spatial constraints
Image credit: K. Grauman and B. Leibe
![Page 49: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/49.jpg)
Large-scale image searchCombining local features, indexing, and spatial constraints
Svetlana Lazebnik
![Page 50: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/50.jpg)
History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• Mid-1990s: sliding window approaches
• Late 1990s: local features
• Early 2000s: parts-and-shape models
![Page 51: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/51.jpg)
Parts-and-shape models
• Model:
– Object as a set of parts
– Relative locations between parts
– Appearance of part
Figure from [Fischler & Elschlager 73]
![Page 52: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/52.jpg)
Constellation models
Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003)
![Page 53: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/53.jpg)
History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• Mid-1990s: sliding window approaches
• Late 1990s: local features
• Early 2000s: parts-and-shape models
• Mid-2000s: bags of features (next!)
Svetlana Lazebnik
No digital cameras!
Slow compute!
Slow compute!
Early GPU compute.
![Page 54: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/54.jpg)
History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• Mid-1990s: sliding window approaches
• Late 1990s: local features
• Early 2000s: parts-and-shape models
• Mid-2000s: bags of features (next!)
• Present trends:
Combined local and global methods,
context, deep learning
Svetlana Lazebnik
No digital cameras!
Slow compute!
Slow compute!
Early GPU compute.
GPU/cloud compute.
![Page 55: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/55.jpg)
Recognition Issues
How to summarize the content of an entire image? How to gauge overall similarity?
How large should the vocabulary be? How to perform quantization efficiently?
How to score the retrieval results?
How might we add more spatial verification?
Kristen Grauman
![Page 56: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/56.jpg)
Recognition Issues
How to summarize the content of an entire image? How to gauge overall similarity?
How large should the vocabulary be? How to perform quantization efficiently?
How to score the retrieval results?
How might we add more spatial verification?
Kristen Grauman
![Page 57: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/57.jpg)
ObjectBag of
‘words’
Bag-of-features models
Svetlana Lazebnik
![Page 58: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/58.jpg)
Origin 1: Bag-of-words models
• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
![Page 59: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/59.jpg)
Origin 1: Bag-of-words models
US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/
• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
![Page 60: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/60.jpg)
Origin 1: Bag-of-words models
US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/
• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
![Page 61: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/61.jpg)
Origin 1: Bag-of-words models
US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/
• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
![Page 62: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/62.jpg)
Origin 2: Texture recognition
• Characterized by repetition of basic elements or textons
• For stochastic textures, the identity of textons matters,
not their spatial arrangement
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001;
Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
![Page 63: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/63.jpg)
Origin 2: Texture recognition
Universal texton dictionary
histogram
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001;
Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
![Page 64: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/64.jpg)
Bag-of-features models
Svetlana Lazebnik
![Page 65: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/65.jpg)
Objects as texture
• All of these are treated as being the same
• No distinction between foreground and background: scene recognition?
Svetlana Lazebnik
![Page 66: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/66.jpg)
1. Feature extraction
2. Learn “visual vocabulary”
3. Quantize features using visual vocabulary
4. Represent images by frequencies of “visual words”
Bag-of-features steps
![Page 67: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/67.jpg)
1. Feature extraction
• Regular grid or interest regions
![Page 68: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/68.jpg)
Extract patch
Detect patches
Compute
descriptor
Slide credit: Josef Sivic
1. Feature extraction
![Page 69: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/69.jpg)
…
1. Feature extraction
Slide credit: Josef Sivic
![Page 70: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/70.jpg)
2. Learning the visual vocabulary
…
Slide credit: Josef Sivic
![Page 71: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/71.jpg)
2. Learning the visual vocabulary
Clustering
…
Slide credit: Josef Sivic
![Page 72: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/72.jpg)
3. Quantize the visual vocabulary
Clustering
…
Slide credit: Josef Sivic
Visual vocabulary
![Page 73: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/73.jpg)
Visual words
Bag of visual words histograms
![Page 74: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/74.jpg)
Example real codebook
…
Source: B. Leibe
Appearance codebook
![Page 75: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/75.jpg)
Bags of features for action recognition
Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human
Action Categories Using Spatial-Temporal Words, IJCV 2008.
Space-time interest points
![Page 76: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/76.jpg)
Visual words/bags of words
+ flexible to geometry / deformations / viewpoint
+ compact summary of image content
+ provides fixed dimensional vector representation for sets
+ very good results in practice
- background and foreground mixed when bag covers whole image -> is it really instance recognition?
- optimal vocabulary formation remains unclear
- basic model ignores geometry – must verify afterwards, or encode via features
Kristen Grauman
![Page 77: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/77.jpg)
But what about layout?
All of these images have the same color histogram.
How to extend bag of words?
![Page 78: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/78.jpg)
Spatial pyramid
Compute histogram in each spatial bin
![Page 79: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/79.jpg)
Spatial pyramid representation
• Extension of a bag of features
• Locally orderless representation at several levels of resolution
level 0
Lazebnik, Schmid & Ponce (CVPR 2006)
![Page 80: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/80.jpg)
Spatial pyramid representation
• Extension of a bag of features
• Locally orderless representation at several levels of resolution
level 0 level 1
Lazebnik, Schmid & Ponce (CVPR 2006)
![Page 81: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/81.jpg)
Spatial pyramid representation
level 0 level 1 level 2
• Extension of a bag of features
• Locally orderless representation at several levels of resolution
Lazebnik, Schmid & Ponce (CVPR 2006)
![Page 82: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/82.jpg)
Scene category dataset
Multi-class classification results
(100 training images per class)
![Page 83: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/83.jpg)
Recognition Issues
How to summarize the content of an entire image? How to gauge overall similarity?
How large should the vocabulary be? How to perform quantization efficiently?
How to score the retrieval results?
How might we add more spatial verification?
Kristen Grauman
![Page 84: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/84.jpg)
Comparing bags of words
Compute cosine similarity (normalized scalar (dot) product) between their occurrence counts, then rank and pick smallest. Nearest neighbor search for similar images.
]4181[=jd
]0115[=q
for vocabulary of V words
Kristen Grauman
QueryDatabase image
×
×
![Page 85: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/85.jpg)
Comparing bags of words
Why might we use cosine similarity here?
What ‘intuitive’ effect does this provide?
]4181[=jd
]0115[=q
for vocabulary of V words
Kristen Grauman
QueryDatabase image
×
×
![Page 86: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/86.jpg)
How can we quickly find images in a large database that match a given image region?
Instance recognition
![Page 87: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/87.jpg)
Simple idea
See how many keypoints are close to keypoints in each other image
Lots of
Matches
Few or No
Matches
But this will be really, really slow!
![Page 88: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/88.jpg)
Fast lookup: inverted index
• For text documents,
an efficient way to
find all pages on
which a word occurs
is to use an index…
• We want to find all
images in which a
feature occurs.
Kristen Grauman
![Page 89: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/89.jpg)
Build Inverted Index from Database
Kristen Grauman
![Page 90: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/90.jpg)
Query Inverted Index
Kristen Grauman
Candidate matches
![Page 91: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/91.jpg)
Query Inverted Index
Kristen Grauman
Candidate matches
w91
1. Extract words in query
2. Inverted file index to
find relevant frames
3. Compare/sort word counts
![Page 92: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/92.jpg)
Inverted index
Key requirement: sparsity.
If most images contain most words, then
we’re not better off than exhaustive search.– Exhaustive search would mean comparing the visual
word distribution of a query versus every page.
![Page 93: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/93.jpg)
Recognition Issues
How to summarize the content of an entire image?
And gauge overall similarity?
How large should the vocabulary be? How to
perform quantization (clustering) efficiently?
How to score the retrieval results?
How might we add more spatial verification?
Kristen GraumanFollowing slides by David Nister (CVPR 2006)
![Page 94: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/94.jpg)
Visual vocabularies: Issues
• How to choose vocabulary size?• Too small: visual words not representative of all patches
• Too large: quantization artifacts, overfitting
• Computational efficiency• Vocabulary trees
(Nister & Stewenius, 2006)
![Page 95: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/95.jpg)
Training the vocabulary tree
![Page 96: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/96.jpg)
Training the vocabulary tree
![Page 97: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/97.jpg)
Training the vocabulary tree
![Page 98: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/98.jpg)
Training the vocabulary tree
![Page 99: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/99.jpg)
Training the vocabulary tree
![Page 100: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/100.jpg)
Training the vocabulary tree
![Page 101: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/101.jpg)
![Page 102: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/102.jpg)
![Page 103: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/103.jpg)
![Page 104: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/104.jpg)
![Page 105: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/105.jpg)
![Page 106: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/106.jpg)
![Page 107: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/107.jpg)
![Page 108: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/108.jpg)
![Page 109: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/109.jpg)
![Page 110: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/110.jpg)
![Page 111: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/111.jpg)
![Page 112: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/112.jpg)
Vocabulary tree built recursively
![Page 113: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/113.jpg)
Each leaf has inverted index
![Page 114: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/114.jpg)
![Page 115: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/115.jpg)
![Page 116: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/116.jpg)
Inverted index built.
![Page 117: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/117.jpg)
Query image
![Page 118: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/118.jpg)
Vocabulary size
Recognition with 6347 images
Nister & Stewenius, CVPR 2006
Influence on performance, sparsity
Branching
factors
Kristen Grauman
![Page 119: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/119.jpg)
Higher branch factor works better (but slower)
![Page 120: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/120.jpg)
Slide
(2006) 110,000,000 images in 5.8 Seconds
David Nister
On a 50k image index
![Page 121: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/121.jpg)
Slide David Nister
![Page 122: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/122.jpg)
Slide David Nister
![Page 123: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/123.jpg)
David Nister
![Page 124: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/124.jpg)
Recognition Issues
How to summarize the content of an entire image?
And gauge overall similarity?
How large should the vocabulary be? How to
perform quantization efficiently?
How to score the retrieval results?
How might we add more spatial verification?
Kristen Grauman
![Page 125: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/125.jpg)
Precision and Recall
By Walber - Own work, CC BY-SA 4.0,
https://commons.wikimedia.org/w/index.php?curid=36926283
True positive (tp) – correct attribution
True negative (tn) – correct rejection
False positive (fp) – incorrect attribution
False negative (fn) – incorrect rejection
Precision = #relevant / #returned
Recall = #relevant / #total relevant
![Page 126: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/126.jpg)
Scoring retrieval quality
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
recall
pre
cis
ion
QueryDatabase size: 10 imagesRelevant (total): 5 images
Results (ordered):
precision = #relevant / #returnedrecall = #relevant / #total relevant
[Ondrej Chum]
![Page 127: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/127.jpg)
China is forecasting a trade surplus of $90bn
(£51bn) to $100bn this year, a threefold
increase on 2004's $32bn. The Commerce
Ministry said the surplus would be created by
a predicted 30% jump in exports to $750bn,
compared with a 18% rise in imports to
$660bn. The figures are likely to further
annoy the US, which has long argued that
China's exports are unfairly helped by a
deliberately undervalued yuan. Beijing
agrees the surplus is too high, but says the
yuan is only one factor. Bank of China
governor Zhou Xiaochuan said the country
also needed to do more to boost domestic
demand so more goods stayed within the
country. China increased the value of the
yuan against the dollar by 2.1% in July and
permitted it to trade within a narrow band, but
the US wants the yuan to be allowed to trade
freely. However, Beijing has made it clear that
it will take its time and tread carefully before
allowing the yuan to rise further in value.
China, trade,
surplus, commerce,
exports, imports, US,
yuan, bank, domestic,
foreign, increase,
trade, value
What else can we borrow from
text retrieval?
![Page 128: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/128.jpg)
tf-idf weighting
• Term frequency – inverse document frequency
• Describe image by frequency of each word within it,
downweight words that appear often in the database
• (Standard weighting for text retrieval)
Total number of
documents in
database
Number of documents
word i occurs in, in
whole database
Number of
occurrences of word
i in document d
Number of words in
document d
Kristen Grauman
![Page 129: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/129.jpg)
Example query: golf green
Results:- How can the grass on the greens at a golf course be so perfect?- For example, a skilled golfer expects to reach the green on a par-four hole in ...- Manufactures and sells synthetic golf putting greens and mats.
Irrelevant result can cause a `topic drift’: - Volkswagen Golf, 1999, Green, 2000cc, petrol, manual, hatchback, 94000miles, 2.0 GTi, 2 Registered Keepers, HPI Checked, Air-Conditioning, Front and Rear Parking Sensors, ABS, Alarm, Alloy
[Ondrej Chum]
Query expansionUse good retrieved results as new inputs.
Increase recall possibly at the expense of precision.
Good new queries: grass, golf course, par-four hole, putting, etc.
Bad new queries: petrol, hatchback, ABS, etc.
![Page 130: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/130.jpg)
Query expansion
…
Query image
Results
New query
Spatial verification
New results
Chum, Philbin, Sivic, Isard, Zisserman: Total Recall…, ICCV 2007 Ondrej Chum
![Page 131: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/131.jpg)
Recognition Issues
How to summarize the content of an entire image?
And gauge overall similarity?
How large should the vocabulary be? How to
perform quantization efficiently?
How to score the retrieval results?
How might we add more spatial verification?
Kristen Grauman
![Page 132: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/132.jpg)
Can we be more accurate?
So far, we treat each image as containing a “bag of words”, with no spatial information
af
z
e
e
afee
h
hWhich matches
better?
Real objects have
consistent geometry
![Page 133: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/133.jpg)
Multi-view matching
vs
…
?
Matching two given
views for depth
Search for a matching
view for recognition
Kristen Grauman
![Page 134: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/134.jpg)
Spatial Verification
Both image pairs have many visual words in common.
Slide credit: Ondrej Chum
Query Query
DB image with high BoW similarity DB image with high BoW
similarity
![Page 135: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/135.jpg)
Only some of the matches are mutually consistent
with real-world geometry imaged by a camera.Ondrej Chum
Spatial Verification
Query Query
DB image with high BoW similarity DB image with high BoW
similarity
![Page 136: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/136.jpg)
Spatial Verification: two basic strategies
• RANSAC
– Typically sort by BoW similarity as initial filter
– Verify by checking support (inliers) for possible
transformations
• e.g., “success” if find a transformation with > N inlier
correspondences
• Generalized Hough Transform
– Let each matched feature cast a vote on location,
scale, orientation of the model object
– Verify parameters with enough votes
Kristen Grauman
![Page 137: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/137.jpg)
RANSAC verification
Fails to meet threshold
on # inliers! Good!
No verification
![Page 138: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/138.jpg)
Recognition via alignment
Pros:
– Effective for reliable features within clutter
– Great for matching specific instances
Cons:
– Expensive post-process (how long for proj3?!)
– Not suited for category recognition
Kristen Grauman
![Page 139: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/139.jpg)
Summary
• Bag of words: quantize feature space into discrete visual words
– Summarize image by distribution of words
• Inverted index: visual word index for faster query time
• Evaluation:
• Additional spatial verification alignment:
– Robust fitting : RANSAC, Generalized Hough Transform
– We will do this in detail later on in the course
Kristen Grauman
![Page 140: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/140.jpg)
Lessons from a decade later
For Category recognition (project 3)
– Bag of Feature models remained the state of the art until Deep Learning.
– Spatial layout either isn't that important or its too difficult to encode.
– Quantization error is, in fact, the bigger problem. Advanced feature encoding methods address this.
– Bag of feature models are nearly obsolete. At best they seem to be inspiring tweaks to deep models e.g., NetVLAD.
James Hays
![Page 141: ICCV 2005 Beijing, Short Course, Oct 15 · History of ideas in recognition • 1960s –early 1990s: the geometric era • 1990s: appearance-based models • Mid-1990s: sliding window](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f7acc07e210c76495246f02/html5/thumbnails/141.jpg)
Lessons from a decade later
For instance retrieval (this lecture):
– deep learning is taking over.
– learn better local features (replace SIFT) e.g., MatchNet 2015
– learn better image embeddings (replace visual word histograms) e.g., Vo and Hays 2016.
– learn spatial verificatione.g., DeTone, Malisiewicz, and Rabinovich 2016.
– learn a monolithic deep network to recognition all locations e.g., Google’s PlaNet 2016.
James Hays