leo zhu csail mit joint work with chen, yuille, freeman and torralba 1
TRANSCRIPT
Recursive Compositional Learning in Computer Vision
Recursive Composition in Computer VisionLeo ZhuCSAIL MIT Joint work with Chen, Yuille, Freeman and Torralba
11Ideas behind Recursive Composition How to deal with image complexityA general framework for different vision tasksRich representation and tractable computation
2Pattern Theory. Grenander 94Compositionality. Geman 02, 06Stochastic Grammar. Zhu and Mumford 06
2Recursive CompositionRepresentationRecursive Compositional Models (RCMs)InferenceRecursive OptimizationLearningSupervised Parameter EstimationUnsupervised Recursive Dictionary LearningRCM-1: Deformable ObjectRCM-2: Articulated ObjectRCM-3: Scene (Entire Image)
3Model Deformable Object Flat MRFNodes: object partsEdges: spatial relationsLimitations:Short range interactionSparse4
Recursive Composision5
Recursive Compositional Models:RCM-16
x: imagey: (position, scale, orientation)graph=(nodes, edges)a: index of nodeb: child of af: appearances on node ag: potentials on edges (a,b)RCM-1: the Recursive Formula7
Recursion
x: image ;y: (position, scale, orientation);Vertical independency;Self-similarity;
Recursive CompositionRepresentationRecursive Compositional Models (RCMs)InferenceRecursive OptimizationLearningSupervised Parameter EstimationUnsupervised Recursive Dictionary Learning
8Polynomial-time InferenceInference task:
Recursive Optimization:
9
Recursion
Polynomial-time Complexity:
9Supervised learningPerceptron algorithm (MLE, max margin svm)Parameter estimation needs fast inference. Supervised Learning10Collins 02. Taskar et al. 04Supervised learning by Perceptron Algorithm Goal:
Input: a set of training images with ground truth . Initialize parameter vector. Training algorithm (Collins 02):Loop over training samples: i = 1 to N Step 1: find the best using inference:
Step 2: Update the parameters:
End of Loop.
11
Inference is critical for learning
where
11Recursive CompositionRepresentationRecursive Compositional Models (RCMs)InferenceRecursive Optimization (Polynomial-time)Learning Supervised Parameter EstimationRCM-1: Deformable Object
12RCM-1: Multi-level PotentialsPotentials for appearance
13
*=
[Gabor,Edge, ]
13RCM-1: Multi-level PotentialsPotentials for shape: triplet descriptors
14
(position, scale, orientation)
14The Inference Results after Supervised Learning15
16
Segmentation Results17
Evaluations: Segmentation and ParsingSegmentation (Accuracy of pixel labeling)The proportion of the correct pixel labels (object or non-object)Parsing (Average Position Error of matching)The average distance between the positions of leaf nodes of the ground truth and those estimated in the parse tree18MethodsTesting SegmentationParsingSpeedRCM-122894.71623sRen (Berkeley)17291Winn (LOCUS)20093Levin and Weiss N/A95Kumar (OBJ CUT)596
Performance Contribution of Multi-level Object PartsMulti-level Precision-Recall curves quantify the recognition performance of object parts. High-level regularity (more parts) help recognition (remove ambiguity).
19
19Recursive CompositionModeling: (Representation)Recursive Compositional Models (RCMs)Inference: (Computing) Recursive Optimization (Polynomial-time)Learning: Supervised Parameter EstimationUnsupervised Recursive LearningRCM-1: deformable object
20Unsupervised LearningTask: given 10 training images, no labeling, no alignment, highly ambiguous features.Induce the structure (nodes and edges) Estimate the parameters.
21
?
Combinatorial Explosion problemCorrespondence is unknownRecursive Dictionary LearningMulti-level dictionary (layer-wise greedy)Bottom-Up and Top-Down recursive procedureThree Principles:Recursive CompositionSuspicious Coincidence Competitive Exclusion
22Barlow 94.
Recursion10 images for training23
Bottom-up Learning24
CompositionClustering
Suspicious Coincidence
CompetitiveExclusion24The Dictionary: From Generic Parts to Object StructuresUnified representation (RCMs) and learningBridge the gap between the generic features and specific object structures
25
25Dictionary Size, Part Sharing and Computational Complexity26LevelCompositionClustersSuspiciousCoincidenceCompetitive ExclusionSeconds0411167,43114,6842624811722,034,851741,66299511625432,135,4671,012,77730553994236,95572,6203029
More Sharing2627
28
29
Top-down refinement30
Fill in missing parts Examine every node from top to bottom31
31Evaluations of Unsupervised Learning32MethodsTestingSegmentationParsingSpeedUnsupervised 31693.317sSupervised22894.71623s
Scale up the System: Issue IMore classes/viewpoints -> more training/detection cost
33Scale up the System: Issue IINo enough data for rare viewpoints/classes
34Our StrategyJoint multi-class multi-view learningAppearance sharingPart sharing35Joint Multi-Class Multi-View Learning120 templates: 5 viewpoints & 26 classes
36Different Viewpoints Share same appearance
37Different Classes Share Common Parts
38Compact Hierarchical Dictionary
39Dense Part Sharing at Low Levels: Layer-2
40Less Part Sharing: Layer-3
41Sparse Part Sharing at High Levels: Layer-4
42Re-usable Parts: All Layers
43The more classes/viewpoints, the more amount of part sharing
44Multi-View Single Class Performance
45Recursive CompositionRepresentationRecursive Compositional Models (RCMs)Inference Recursive Optimization (Polynomial-time)LearningSupervised Parameter EstimationRCM-1: Deformable ObjectRCM-2: Articulated Object
46
RCM-2 for Articulated Object: Horses47
y=(switch, position, scale, orientation)CompositionSwitch multiple poses47RCM-2 for Human Body48
4849
Recursive CompositionRepresentationRecursive Compositional Models (RCMs)InferenceRecursive Optimization (Polynomial-time)Learning Supervised Parameter EstimationRCM-1: Deformable ObjectRCM-2: Articulated ObjectRCM-3: Scene (Entire Image)
50Image Scene ParsingTask: Image Segmentation and Labeling 51
Scene Modeling: RCM-352Geman and Geman 84.L Zhu et al. NIPS 08
Flat MRF: object labeling (recognition only).Lack of long-range interactions. Lack of region-level properties. High-order potentials -> heavy computation
Scene Modeling: RCM-353Geman and Geman 84.L Zhu et al. NIPS 08
Flat MRF: object labeling (recognition only).Joint segmentation-recognition templateSegmentation and Recognition Template(segmentation, object) pair: chicken-and-egg of segmentation and recognition.Multi-level low-dimensional abstraction54
Global: gist of sceneobject layoutLocal: concurrent shape and appearancecoarse to fine54RCM-3 for Scene Parsing55f: appearance likelihoodg:object layout priorhomogeneitylayer-wise consistency object texture colorobject co-occurrence segmentation priorRecursiony=(segmentation, object)
HorseGrass
RCM-3: Inference and LearningState space: C=21 classes; D=30 templates; K=3 classes / per templateInference (recursive optimization):
Supervised learning (perceptron )56
57
58
Evaluations of RCM-3Implementation Details
Comparisons59TextonBoostShotton et al. 04PLSA-MRFBerbeekand TriggAutoContextTu 08Classifier onlyRCM-3Average57.7646867.274.5Global 72.2 69 (Classifier)73.577.775.981.4DatasetClassesSizeTraining SizeTraining TimeTesting TimeMSRC2159145%55h30s Unified RCMs: Object vs. Scene60
RCM-1 RCM-2 RCM-3 Triplets of Parts Triplets of Segments Boundary only Region + Boundary
60ConclusionsPrinciple: Recursive Composition Composition -> complexity decomposition Recursion -> Universal rules (self-similarity)Recursion and Composition -> sparsenessOne formula for different tasks.Key: the representation of visual patterns, i.e. y.Low dimension, simple potentialsScaling up: practical Image Understanding System
6161ReferencesLong Zhu, Yuanhao Chen, Antonio Torralba, William Freeman, AlanYuille. Part and Appearance Sharing: Recursive Compositional Models for Multi-View Multi-Object Detection. CVPR. 2010.Long Zhu, Yuanhao Chen, Yuan Lin, Chenxi Lin, Alan Yuille. Recursive Segmentation and Recognition Templates for 2D Parsing. NIPS 2008.Long Zhu, Chenxi Lin, Haoda Huang, Yuanhao Chen, Alan Yuille. Unsupervised Structure Learning: Hierarchical Recursive Composition, Suspicious Coincidence and Competitive Exclusion. ECCV 2008.Long Zhu, Yuanhao Chen, Yifei Lu, Chenxi Lin, Alan Yuille. Max Margin AND/OR Graph Learning for Parsing the Human Body. CVPR 2008.Long Zhu, Yuanhao Chen, Xingyao Ye, Alan Yuille. Structure-Perceptron Learning of a Hierarchical Log-Linear Model. CVPR 2008. Yuanhao Chen, Long Zhu, Chenxi Lin, Alan Yuille, Hongjiang Zhang. Rapid Inference on a Novel AND/OR graph for Object Detection, Segmentation and Parsing. NIPS 2007.Long Zhu, Alan L. Yuille. A Hierarchical Compositional System for Rapid Object Detection. NIPS 200562Backcup Slides63Polynomial-time inference:
Supervised learningPerceptron algorithm (MLE, max margin svm)Parameter estimation needs fast inference. Rapid Inference and Supervised Learning64
Recursion
Collins 02. Taskar et al. 0465
66
Recursive Dictionary LearningTask: find a small dictionary D (sparse coding).
Multi-level dictionary (layer-wise greedy)Bottom-Up and Top-Down recursive procedure
67
Barlow 94.
RecursionTemplate Matching68