convolutional-recursive deep learning for 3d object...

36
Convolutional-Recursive Deep Learning for 3D Object Classification Iro Armeni, Manik Dhar Richard Socher, Brody Huval, Bharath Bhat, Christopher D. Manning, Andrew Y. Ng NIPS 2012

Upload: others

Post on 19-Mar-2020

29 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Convolutional-Recursive Deep

Learning for 3D Object Classification

Iro Armeni, Manik Dhar

Richard Socher, Brody Huval, Bharath Bhat, Christopher D. Manning, Andrew Y. NgNIPS 2012

Page 2: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

● Hand-designed features

● Useful extra information in the depth modality

● Affordable sensing for RGB-D (Kinect)

● Many applications (robotics)

Motivation

Page 3: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Previous Work: Designed Features

[img] Grauman et al., The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features, ICCV’05Bo et al., Depth kernel descriptors for object recognition, IROS’11Bay et al., Speeded-Up Robust Features (SURF), Computer Vision and Image Understanding, 110(3), ‘08

● SURF, SIFT

● Kernel Descriptors

● Pyramid Matching Kernel

Page 4: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Previous Work: Learned Features

[left] Bo et al., Unsupervised Feature Learning for RGB-D Based Object Recognition, ISER’12[right] Blum et al., A Learned Feature Descriptor for Object Recognition in RGB-D Data, ICRA’12

Page 5: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Limitations of previous work

● Difficult to extend to new modalities

● Capture limited features

● May not use the full image

● Need additional input

Page 6: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Convolutional - Recursive Neural Networks1. Unsupervised

filter learning

2. Single-layer CNN 3. Multi-random RNNs

4. Softmax Classification

Page 7: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

● New RNN-based architecture to combine low level CNN features (fixed tree

structures, multiple RNNs)

● RNNs with random weights produce high quality features

● No additional input channels (surface normals)

● Fast approach

● State of the art results on detecting household objects.

Contributions

Page 8: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Convolutional - Recursive Neural Networks

Model architecture details

Page 9: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Learning the CNN filters

[img] Tao Wan, David J. Wu, Adam Coates, Andrew Y. Ng, End-to-End Text Recognition with Convolutional Neural Networks, ICPR 2012

Page 10: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

CNN filters

Page 11: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

A single CNN layer

[img] K. Jarrett and K. Kavukcuoglu and M. Ranzato and Y. LeCun. What is the Best Multi-Stage Architecture for Object Recognition? In ICCV. IEEE, 2009

Page 12: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Recursive Neural Networks (RNN)

x1 x2 x3 x4

p2

Input

p1Layer 1

p3Output

Wx

1x

2p = fx5 x6

non-linearitye.g. tanh

● Structured Prediction: learn compositional features and part interactions

● Hierarchical Representation

children

parent

Page 13: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Recursive Neural Networks (RNN)

output of CNNKxrxr 3D matrix

r=4, b=22 2

2

2

1 11

1

1st layer(r/b)2 parents

(r/b)2 = 4

2nd layer(r/b)2 parents

(r/b)2 = 1

Example of RNN architecture

Page 14: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Recursive Neural Networks (RNN)

● Learn hierarchical feature representations by applying the same NN recursively in a tree structure

● Object classification + CNN allow to use fixed tree structure

● Generalized RNN architecture to merge blocks of adjacent vectors, not only pairs

Page 15: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

N RNNs N R

NNs

K K

K

N RNNs

KN

Back-prop vs Multi-Random RNNs

Back propagation is costly (very slow).

x5 x6

x1 x2 x3 x4

p3

x5 x6

x1 x2 x3 x4

p3

RGB Depth

Alternatively use multiple RNNs, with randomly initialized weights W.

Concatenate all RNN outputs to a N K-dimensional vector (input to softmax).

Page 16: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Experiments

Object Category Recognition

Page 17: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Large-scale hierarchical multi-view RGB-D object dataset*

● 51 object categories (house/office)

● Total of 300 instances

● Multi-view RGB-D images

● 3 angles, ~600 images per instance

● Total of 207,920 images

* Lai et al., A Large-Scale Hierarchical Multi-View RGB-D Object Dataset, ICRA’11

RGB object examples

Page 18: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

● Subsample images per instance to ⅕ (120 images)

● Same setup as in Lai et al.* - 10 random splits

● Test: 1 instance per category (120 images)

● Training: remaining images (~34,000)

● Resize: 148x148

Experimental Setup: General

* Lai et al., A Large-Scale Hierarchical Multi-View RGB-D Object Dataset, ICRA’11

Page 19: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

● K-means on 500,000 image patches (random sampling from each split’s training set)

● # of clusters/filters: 128

● Size of patches: 9x9x3 (RGB), 9x9 (Depth)

● Pre-processing: Individual normalization (-mean, /std) & ZCA whitening

Experimental Setup: Filters

Page 20: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

● Filter bank size: 128

● Filter WxH: 9x9

● Average Pooling: size 10, stride 5

● Output: 128x27x27

Experimental Setup: CNN layer

Page 21: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

● Block size (children): 3x3

● (Single) Tree layers: 128x27x27, 128x9x9, 128x3x3, 128x1x1

● # of RNNs: 128

Final concatenated output (RGB-D): 2x1282 = 32768

Experimental Setup: Multi RNNs

Page 22: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Results: Comparison to other methods

[2] Lai et al., A Large-Scale Hierarchical Multi-View RGB-D Object Dataset, ICRA’11[5] Bo et al., Depth Kernel descriptors for object recognition, IROS’11[6] Blum et al., A Learned Feature Descriptor for Object Recognition in RGB-D Data, ICRA’12[7] Bo et al., Unsupervised Feature Learning for RGB-D Based Object Recognition, ISER’12

Various hand-crafted features

Learned features sparsely applied

additional input,larger feature vector (x5) 0.7% better

Page 23: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Model Analysis: CNN-RNN vs 2-layer CNN

[17] Jarrett et al., What is the Best Multi-Stage Architecture for Object Recognition?, ICCV’09

1. Comparison with 2-layer CNN as in [17]

2. Comparison with 2-layer CNN as in [17] using k-means pre-trained filters

RNN* outperforms and is 4x faster

*only RGB

Page 24: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Model Analysis: RNN vs TNN

TNN: Tree structured NN with untied weights(same weights within a layer, different among layers)

It has more parameters but worst performance.

Tying the weights is beneficial. *only RGB

Page 25: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Model Analysis: Multi-random RNNs vs single tRNN

tRNN: Trained RNN with careful parameter tuning

128 random RNNs perform 2% better and faster.

*only RGB

Page 26: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Model Analysis: how many random RNNs?

Page 27: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Model Analysis: RGB, Depth or RGB+D?

Classifier benefits from RGB and Depth combination

Page 28: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Error Analysis

Confusion Matrix

GT: y-axisPredictions: x-axis

Page 29: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Error Analysis

water bottle

shampoo bottle

garlic

mushroom

pitcher

cap

kleenex box

white cap

Page 30: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

- New model combining CNN + RNN

- Learning a lower-dimensional feature vector with high representational

power

- RNNs: hierarchical representation of the structure (spatial interactions)

- No back-propagation

- Applicability on Depth image domain

- Combination of modalities into one feature vector (complementing features)

- Use only of raw data

- Parallelization & High speed

Conclusion

Page 31: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Advances in RGB-D object recognition

Page 32: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Convolutional Fisher Kernels for RGB-D Object Recognition

Cheng et al., Convolutional Fisher Kernels for RGB-D Object Recognition, 3DV’15

Page 33: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Cheng et al., Convolutional Fisher Kernels for RGB-D Object Recognition, 3DV’15

Convolutional Fisher Kernels for RGB-D Object Recognition

86.8%

78.9%

80.8%85.8%

86.8%

91.2%

~5% increase

Page 34: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Eitel et al., Multimodal Deep Learning for Robust RGB-D Object Recognition, IROS’15

Multimodal Deep Learning for Robust RGB-D Object Recognition

Page 35: Convolutional-Recursive Deep Learning for 3D Object ...web.stanford.edu/class/cs331b/2016/presentations/paper3.pdf · Convolutional-Recursive Deep Learning for 3D Object Classification

Eitel et al., Multimodal Deep Learning for Robust RGB-D Object Recognition, IROS’15

Multimodal Deep Learning for Robust RGB-D Object Recognition

78.9 80.8

~5% increase