leo zhu csail mit joint work with chen, yuille, freeman and torralba 1

Recursive Compositional Learning in Computer Vision

Recursive Composition in Computer VisionLeo ZhuCSAIL MIT Joint work with Chen, Yuille, Freeman and Torralba

11Ideas behind Recursive Composition How to deal with image complexityA general framework for different vision tasksRich representation and tractable computation

2Pattern Theory. Grenander 94Compositionality. Geman 02, 06Stochastic Grammar. Zhu and Mumford 06

2Recursive CompositionRepresentationRecursive Compositional Models (RCMs)InferenceRecursive OptimizationLearningSupervised Parameter EstimationUnsupervised Recursive Dictionary LearningRCM-1: Deformable ObjectRCM-2: Articulated ObjectRCM-3: Scene (Entire Image)

3Model Deformable Object Flat MRFNodes: object partsEdges: spatial relationsLimitations:Short range interactionSparse4

Recursive Composision5

Recursive Compositional Models:RCM-16

x: imagey: (position, scale, orientation)graph=(nodes, edges)a: index of nodeb: child of af: appearances on node ag: potentials on edges (a,b)RCM-1: the Recursive Formula7

Recursion

x: image ;y: (position, scale, orientation);Vertical independency;Self-similarity;

Recursive CompositionRepresentationRecursive Compositional Models (RCMs)InferenceRecursive OptimizationLearningSupervised Parameter EstimationUnsupervised Recursive Dictionary Learning

8Polynomial-time InferenceInference task:

Recursive Optimization:

9

Recursion

Polynomial-time Complexity:

9Supervised learningPerceptron algorithm (MLE, max margin svm)Parameter estimation needs fast inference. Supervised Learning10Collins 02. Taskar et al. 04Supervised learning by Perceptron Algorithm Goal:

Input: a set of training images with ground truth . Initialize parameter vector. Training algorithm (Collins 02):Loop over training samples: i = 1 to N Step 1: find the best using inference:

Step 2: Update the parameters:

End of Loop.

11

Inference is critical for learning

where

11Recursive CompositionRepresentationRecursive Compositional Models (RCMs)InferenceRecursive Optimization (Polynomial-time)Learning Supervised Parameter EstimationRCM-1: Deformable Object

12RCM-1: Multi-level PotentialsPotentials for appearance

13

*=

[Gabor,Edge, ]

13RCM-1: Multi-level PotentialsPotentials for shape: triplet descriptors

14

(position, scale, orientation)

14The Inference Results after Supervised Learning15

16

Segmentation Results17

Evaluations: Segmentation and ParsingSegmentation (Accuracy of pixel labeling)The proportion of the correct pixel labels (object or non-object)Parsing (Average Position Error of matching)The average distance between the positions of leaf nodes of the ground truth and those estimated in the parse tree18MethodsTesting SegmentationParsingSpeedRCM-122894.71623sRen (Berkeley)17291Winn (LOCUS)20093Levin and Weiss N/A95Kumar (OBJ CUT)596

Performance Contribution of Multi-level Object PartsMulti-level Precision-Recall curves quantify the recognition performance of object parts. High-level regularity (more parts) help recognition (remove ambiguity).

19

19Recursive CompositionModeling: (Representation)Recursive Compositional Models (RCMs)Inference: (Computing) Recursive Optimization (Polynomial-time)Learning: Supervised Parameter EstimationUnsupervised Recursive LearningRCM-1: deformable object

20Unsupervised LearningTask: given 10 training images, no labeling, no alignment, highly ambiguous features.Induce the structure (nodes and edges) Estimate the parameters.

21

?

Combinatorial Explosion problemCorrespondence is unknownRecursive Dictionary LearningMulti-level dictionary (layer-wise greedy)Bottom-Up and Top-Down recursive procedureThree Principles:Recursive CompositionSuspicious Coincidence Competitive Exclusion

22Barlow 94.

Recursion10 images for training23

Bottom-up Learning24

CompositionClustering

Suspicious Coincidence

CompetitiveExclusion24The Dictionary: From Generic Parts to Object StructuresUnified representation (RCMs) and learningBridge the gap between the generic features and specific object structures

25

25Dictionary Size, Part Sharing and Computational Complexity26LevelCompositionClustersSuspiciousCoincidenceCompetitive ExclusionSeconds0411167,43114,6842624811722,034,851741,66299511625432,135,4671,012,77730553994236,95572,6203029

More Sharing2627

28

29

Top-down refinement30

Fill in missing parts Examine every node from top to bottom31

31Evaluations of Unsupervised Learning32MethodsTestingSegmentationParsingSpeedUnsupervised 31693.317sSupervised22894.71623s

Scale up the System: Issue IMore classes/viewpoints -> more training/detection cost

33Scale up the System: Issue IINo enough data for rare viewpoints/classes

34Our StrategyJoint multi-class multi-view learningAppearance sharingPart sharing35Joint Multi-Class Multi-View Learning120 templates: 5 viewpoints & 26 classes

36Different Viewpoints Share same appearance

37Different Classes Share Common Parts

38Compact Hierarchical Dictionary

39Dense Part Sharing at Low Levels: Layer-2

40Less Part Sharing: Layer-3

41Sparse Part Sharing at High Levels: Layer-4

42Re-usable Parts: All Layers

43The more classes/viewpoints, the more amount of part sharing

44Multi-View Single Class Performance

45Recursive CompositionRepresentationRecursive Compositional Models (RCMs)Inference Recursive Optimization (Polynomial-time)LearningSupervised Parameter EstimationRCM-1: Deformable ObjectRCM-2: Articulated Object

46

RCM-2 for Articulated Object: Horses47

y=(switch, position, scale, orientation)CompositionSwitch multiple poses47RCM-2 for Human Body48

4849

Recursive CompositionRepresentationRecursive Compositional Models (RCMs)InferenceRecursive Optimization (Polynomial-time)Learning Supervised Parameter EstimationRCM-1: Deformable ObjectRCM-2: Articulated ObjectRCM-3: Scene (Entire Image)

50Image Scene ParsingTask: Image Segmentation and Labeling 51

Scene Modeling: RCM-352Geman and Geman 84.L Zhu et al. NIPS 08

Flat MRF: object labeling (recognition only).Lack of long-range interactions. Lack of region-level properties. High-order potentials -> heavy computation

Scene Modeling: RCM-353Geman and Geman 84.L Zhu et al. NIPS 08

Flat MRF: object labeling (recognition only).Joint segmentation-recognition templateSegmentation and Recognition Template(segmentation, object) pair: chicken-and-egg of segmentation and recognition.Multi-level low-dimensional abstraction54

Global: gist of sceneobject layoutLocal: concurrent shape and appearancecoarse to fine54RCM-3 for Scene Parsing55f: appearance likelihoodg:object layout priorhomogeneitylayer-wise consistency object texture colorobject co-occurrence segmentation priorRecursiony=(segmentation, object)

HorseGrass

RCM-3: Inference and LearningState space: C=21 classes; D=30 templates; K=3 classes / per templateInference (recursive optimization):

Supervised learning (perceptron )56

57

58

Evaluations of RCM-3Implementation Details

Comparisons59TextonBoostShotton et al. 04PLSA-MRFBerbeekand TriggAutoContextTu 08Classifier onlyRCM-3Average57.7646867.274.5Global 72.2 69 (Classifier)73.577.775.981.4DatasetClassesSizeTraining SizeTraining TimeTesting TimeMSRC2159145%55h30s Unified RCMs: Object vs. Scene60

RCM-1 RCM-2 RCM-3 Triplets of Parts Triplets of Segments Boundary only Region + Boundary

60ConclusionsPrinciple: Recursive Composition Composition -> complexity decomposition Recursion -> Universal rules (self-similarity)Recursion and Composition -> sparsenessOne formula for different tasks.Key: the representation of visual patterns, i.e. y.Low dimension, simple potentialsScaling up: practical Image Understanding System

6161ReferencesLong Zhu, Yuanhao Chen, Antonio Torralba, William Freeman, AlanYuille. Part and Appearance Sharing: Recursive Compositional Models for Multi-View Multi-Object Detection. CVPR. 2010.Long Zhu, Yuanhao Chen, Yuan Lin, Chenxi Lin, Alan Yuille. Recursive Segmentation and Recognition Templates for 2D Parsing. NIPS 2008.Long Zhu, Chenxi Lin, Haoda Huang, Yuanhao Chen, Alan Yuille. Unsupervised Structure Learning: Hierarchical Recursive Composition, Suspicious Coincidence and Competitive Exclusion. ECCV 2008.Long Zhu, Yuanhao Chen, Yifei Lu, Chenxi Lin, Alan Yuille. Max Margin AND/OR Graph Learning for Parsing the Human Body. CVPR 2008.Long Zhu, Yuanhao Chen, Xingyao Ye, Alan Yuille. Structure-Perceptron Learning of a Hierarchical Log-Linear Model. CVPR 2008. Yuanhao Chen, Long Zhu, Chenxi Lin, Alan Yuille, Hongjiang Zhang. Rapid Inference on a Novel AND/OR graph for Object Detection, Segmentation and Parsing. NIPS 2007.Long Zhu, Alan L. Yuille. A Hierarchical Compositional System for Rapid Object Detection. NIPS 200562Backcup Slides63Polynomial-time inference:

Supervised learningPerceptron algorithm (MLE, max margin svm)Parameter estimation needs fast inference. Rapid Inference and Supervised Learning64

Recursion

Collins 02. Taskar et al. 0465

66

Recursive Dictionary LearningTask: find a small dictionary D (sparse coding).

Multi-level dictionary (layer-wise greedy)Bottom-Up and Top-Down recursive procedure

67

Barlow 94.

RecursionTemplate Matching68

leo zhu csail mit joint work with chen, yuille, freeman and torralba 1

Documents