an analysis of single-layer networks in unsupervised feature learning adam coates, honglak lee and...

23
An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization Adam Coates and Andrew Y. Ng ICML 2011 Presented by: Mingyuan Zhou Duke University, ECE June 17, 2011

Upload: barrie-lambert

Post on 02-Jan-2016

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

An Analysis of Single-Layer Networksin Unsupervised Feature LearningAdam Coates, Honglak Lee and Andrew Y. Ng

AISTATS 2011

The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization

Adam Coates and Andrew Y. NgICML 2011

Presented by: Mingyuan ZhouDuke University, ECE

June 17, 2011

Page 2: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Outline

• Introduction

• Unsupervised feature learning

• Parameter setting

• Experiments on CIFAR, NORB and STL

• Conclusions

An Analysis of Single-Layer Networksin Unsupervised Feature LearningAdam Coates, Honglak Lee and Andrew Y. Ng

AISTATS 2011

Page 3: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Training/testing pipeline

• Feature learning:– Extract random patches from unlabeled training images– Apply a pre-processing stage to the patches– Learn a feature-mapping using an unsupervised learning

algorithm

• Feature extraction and classification:– Extract features from equally spaced sub-patches covering the

input image– Pool features together over regions of the input image to reduce

the number of feature values– Train a linear classifier to predict the labels given the feature

vectors

Page 4: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

• Pre-processing of patches– Mean subtraction and scale normalization– Whitening

• Unsupervised learning– Sparse auto-encoder

– Sparse restricted Boltzmann machine

– K-means clustering

Hard-assignment:

Soft-assignment:

Feature learning

Page 5: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

• Unsupervised learning– Sparse auto-encoder– Sparse restricted Boltzmann machine– K-means clustering– Gaussian mixture model (GMM)

Feature learning

Page 6: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Feature extraction and classification

Page 7: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

• Model parameters:– Whitening?– Number of features K– Stride s (all the overlapping patches are used when s

= 1)– Receptive field (patch) size w

Experiments and analysis

Page 8: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Experiments and analysis

Page 9: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Experiments and analysis

Page 10: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Experiments and analysis

Page 11: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Experiments and analysis

Page 12: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Conclusions

Mean-subtraction, scale normalization and Whitening + Large K + Small s+ Right patch size w+ Simple feature learning algorithm (soft K-means)

=

State-of-the-art results on CIFAR-10 and NORB

Page 13: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Outline• Motivations and contributions• Review of dictionary learning algorithms• Review of sparse coding algorithms• Experiments on CIFAR, NORB and Caltech101• Conclusions

The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization

Adam Coates and Andrew Y. NgICML 2011

Page 14: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Main contributions

Page 15: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Dictionary learning algorithms

• Sparse coding (SC)

• Orthogonal matching pursuit (OMP-k)

• Sparse RBMs and sparse auto-encoders (RBM, SAE)

• Randomly sampled patches (RP)

• Random weights (R)

Page 16: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Sparse coding algorithms

• Sparse coding (SC)

• OMP-k

• Soft threshold (T)

• “Natural” encoding

Page 17: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Experimental results

Page 18: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Experimental results

Page 19: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding
Page 20: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding
Page 21: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding
Page 22: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Comments on dictionary learning

• The results have shown that the main advantage of sparse coding is as an encoder, and that the choice of basis functions has little effect on performance.

• The main value of the dictionary is to provide a highly overcomplete basis on which to project the data before applying an encoder, but that the exact structure of these basis functions is less critical than the choice of encoding

• All that appears necessary is to choose the basis to roughly tile the space of the input data. This increases the chances that a few basis vectors will be near to an input, yielding a large activation that is useful for identifying the location of the input on the data manifold later

• This explains why vector quantization is quite capable of competing with more complex algorithms: it simply ensures that there is at least one dictionary entry near any densely populated areas of the input space. We expect that learning is more crucial if we use small dictionaries, since we would then need to be more careful to pick basis functions that span the space of inputs equitably.

Page 23: An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding

Conclusions

• The main power of sparse coding is not that it learns better basis functions. In fact, we discovered that any reasonable tiling of the input space (including randomly chosen input patches) is sufficient to obtain high performance on any of the three very different recognition problems that we tested.

• Instead, the main strength of sparse coding appears to arise from its non-linear encoding scheme, which was almost universally effective in our experiments—even with no training at all.

• Indeed, it was difficult to beat this encoding on the Caltech 101 dataset. In many cases, however, it was possible to do nearly as well using only a soft threshold function, provided we have sufficient labeled data.

• Overall, we conclude that most of the performance obtained in our results is a function of the choice of architecture and encoding, suggesting that these are key areas for further study and improvements.