mit6870 orsu lecture11
TRANSCRIPT
Lecture 11
Hierarchies
6.870 Object Recognition and Scene Understanding http://people.csail.mit.edu/torralba/courses/6.870/6.870.recognition.htm
Next weekAlec Rivers
Scene Understanding Based on Object Relationships
Gokberk Cinbis
Category Level 3D Object Detection Using View-Invariant Representations
Hueihan Jhuang and Sharat Chikkerur
Video shot boundary detection using GIST representation
Jenny Yuen
Semiautomatic alignment of text and images
Nathaniel R Twarog
A Filtering Approach to Image Segmentation: Perceptual Grouping in Feature Space
Nicolas Pinto
Evaluating dense feature descriptor and multi-kernel learning for face detection/recognition
Tilke Judd and Vladimir Bychkovsky
Identify the same people in different photographs from the same event
Tom Kollar
Context-based object priors for scene understanding
Tom Ouyang
Hand-Drawn Sketch Recognition, A Vision-Based Approach
Papers due this Friday (5pm): send PDF by email
Hierarchies vs. holistic features
Although we haveseen some “successful”holistic methods.
Hierarchies, compositionality and reusable parts
Compositionality refers to our evident ability to construct hierarchical representations, whereby constituents are used and reused in an essentially infinite variety of relational compositions.
Assumption (Bienenstock, Geman): what is learnable is what is representable as a hierarchy of more-or-less simple composition rules.
Bienenstock, Geman. Compositionality in neural systems.
Hierarchies vs. holistic features
Feature hierarchies are often inspired by the structure of the primate visual system, which has been shown to use a hierarchy of features of increasing complexity, from simple local features in the primary visual cortex, to complex shapes and object views in higher cortical areas.
S. Ullman et al.
Diagram of the visual system
Felleman and Van Essen, 1991
Modified by T. Serre from Ungerleider and Haxby, and then shamelessly copied by me.
Modified by T. Serre from Ungerleider and Haxby, and then copied by me.
Modified by T. Serre from Ungerleider and Haxby, and then copied by me.
Modified by T. Serre from Ungerleider and Haxby, and then copied by me.
Modified by T. Serre from Ungerleider and Haxby, and then copied by me.
IT readout
Slide by Serre
Identifying natural images from human brain activity
?
Kay, K.N., Naselaris, T., Prenger, R.J., & Gallant, J.L. (2008). Identifying natural images from human brain activity. Nature, 452, 352-355.
Voxel Activity ModelGoal: to predict the image seen by the observer out of a large collection of possible images. And to do this for new images: this requires predicting fMRI activity for unseen images.
Kay, K.N., Naselaris, T., Prenger, R.J., & Gallant, J.L. (2008). Identifying natural images from human brain activity. Nature, 452, 352-355.
Kay, K.N., Naselaris, T., Prenger, R.J., & Gallant, J.L. (2008). Identifying natural images from human brain activity. Nature, 452, 352-355.
Performance
Kay, K.N., Naselaris, T., Prenger, R.J., & Gallant, J.L. (2008). Identifying natural images from human brain activity. Nature, 452, 352-355.
D. Marr
NeocognitronFukushima (1980). Hierarchical multilayered neural network
S-cells work as feature-extracting cells. They resemble simple cells of the primary visual cortex in their response.
C-cells, which resembles complex cells in the visual cortex, are inserted in the network to allow for positional errors in the features of the stimulus. The input connections of C-cells, which come from S-cells of the preceding layer, are fixed and invariable. Each C-cell receives excitatory input connections from a group of S-cells that extract the same feature, but from slightly different positions. The C-cell responds if at least one of these S-cells yield an output.
Neocognitron
Learning is done greedily for each layer
Convolutional Neural Network
The output neurons share all the intermediate levels
Le Cun et al, 98
Hierarchical models of object recognition in cortex
Hierarchical extension of the classical paradigm of building complex cells from simple cells. Uses same notation than Fukushima: “S” units performing template matching, solid lines and “C” units performing non-linear operations ( “MAX” operation, dashed lines)
Riesenhuber, M. and Poggio, T. 99
Slide by T. Serre
Slide by T. Serre
Learning a Compositional Hierarchy of Object Structure
Fidler & Leonardis, CVPR’07; Fidler, Boben & Leonardis, CVPR 2008Fidler & Leonardis, CVPR’07; Fidler, Boben & Leonardis, CVPR 2008
The architecture
Parts model
Learned parts
Learning a Compositional Hierarchy of Object Structure
Fidler & Leonardis, CVPR’07; Fidler, Boben & Leonardis, CVPR 2008Fidler & Leonardis, CVPR’07; Fidler, Boben & Leonardis, CVPR 2008
Learning a Compositional Hierarchy of Object Structure
• Fidler & Leonardis, CVPR’07Fidler & Leonardis, CVPR’07• Fidler, Boben & Leonardis, CVPR 2008Fidler, Boben & Leonardis, CVPR 2008
Layer 2
Layer 3
Layer 4
Layer 1
LEARNLEARNhierarchical libraryhierarchical library
car motorcycle dog person
• Hierarchical compositional architectureHierarchical compositional architecture• Features are shared at each layer Features are shared at each layer • Learning is done on natural imagesLearning is done on natural images• Indexing and matching detection schemeIndexing and matching detection scheme
Learned L1 – L3Learned L1 – L3
Learned hierarchical Learned hierarchical vocabularyvocabulary DetectionsDetections
Learning a Compositional Hierarchy of Object Structure
• Fidler & Leonardis, CVPR’07Fidler & Leonardis, CVPR’07• Fidler, Boben & Leonardis, CVPR 2008Fidler, Boben & Leonardis, CVPR 2008
Layer 2
Layer 3
Layer 4
Layer 1
LEARNLEARNhierarchical libraryhierarchical library
car motorcycle dog person
Learned hierarchical Learned hierarchical vocabularyvocabulary DetectionsDetections
• Hierarchical compositional architectureHierarchical compositional architecture• Features are shared at each layer Features are shared at each layer • Learning is done on natural imagesLearning is done on natural images• Biologically plausible?Biologically plausible?
• Learns T- and L- junctions, different Learns T- and L- junctions, different curvatures, and features that graduallycurvatures, and features that graduallyincrease in complexityincrease in complexity
Hierarchical Topic Models
z
x
JN
K
Latent Dirichlet Allocation (LDA)Blei, Ng, & Jordan, JMLR 2003
Pr(topic | doc)
Pr(word | topic)
“bag of features” models:
Object Recognition (Sivic et. al., ICCV 2005)
Scene Recognition (Fei-Fei et. al., CVPR 2005)
HDP Object Model
• We learn the number of parts.
• Each object uses a different number of parts.
• The model assumes a known number of object categories.
Parts are distributions over appearances and locations
Sudderth et al. IJCV 2008