cs 2750: machine learningkovashka/cs2750_sp17/ml_04_pca.pdf · cs 2750: machine learning...

CS 2750: Machine Learning

Dimensionality Reduction

Prof. Adriana KovashkaUniversity of Pittsburgh

January 19, 2017

Plan for today

• Dimensionality reduction – motivation

• Principal Component Analysis (PCA)

• Applications of PCA

• Other methods for dimensionality reduction

Why reduce dimensionality?

• Data may intrinsically live in a lower-dim space

• Too many features and too few data

• Lower computational expense (memory, train/test time)

• Want to visualize the data in a lower-dim space

• Want to use data of different dimensionality

Goal

• Input: Data in a high-dim feature space

• Output: Projection of same data into a lower-dim space

• F: high-dim X low-dim X

Goal

Slide credit: Erik Sudderth

Some criteria for success

• Find a projection where the data has:

– Low reconstruction error

– High variance of the data

See hand-written notes for how we find the optimal projection

Slide credit: Subhransu Maji

Principal Components Analysis

Demo

• http://www.cs.pitt.edu/~kovashka/cs2750_sp17/PCA_demo.m

• http://www.cs.pitt.edu/~kovashka/cs2750_sp17/PCA.m

• Demo with eigenfaces: http://www.cs.ait.ac.th/~mdailey/matlab/

http://www.cs.pitt.edu/~kovashka/cs2750_sp17/PCA_demo.m

http://www.cs.pitt.edu/~kovashka/cs2750_sp17/PCA.m

http://www.cs.ait.ac.th/~mdailey/matlab/

• Covariance matrix is huge (D2 for D pixels)

• But typically # examples N << D

• Simple trick– X is NxD matrix of normalized training data

– Solve for eigenvectors u of XXT instead of XTX

– Then Xu is eigenvector of covariance XTX

– Need to normalize each vector of Xu into unit length

Adapted from Derek Hoiem

Implementation issue

How to pick K?

• One goal can be to pick K such that P% of the variance of the data is preserved, e.g. 90%

• Let Λ = a vector containing the eigenvalues of the covariance matrix

• Total variance can be obtained from entries of Λ

– total_variance = sum(Λ);

• Take as many of these entries as needed

– K = find( cumsum(Λ) / total_variance >= P, 1);

Variance preserved at i-th eigenvalue

Figure 12.4 (a) from Bishop

Application: Face Recognition

Image from cnet.com

Face recognition: once you’ve detected and cropped a face, try to recognize it

Detection Recognition “Sally”

Slide credit: Lana Lazebnik

Typical face recognition scenarios

• Verification: a person is claiming a particular identity; verify whether that is true– E.g., security

• Closed-world identification: assign a face to one person from among a known set

• General identification: assign a face to a known person or to “unknown”

Slide credit: Derek Hoiem

The space of all face images• When viewed as vectors of pixel values, face images are

extremely high-dimensional– 24x24 image = 576 dimensions

– Slow and lots of storage

• But very few 576-dimensional vectors are valid face images

• We want to effectively model the subspace of face images


Representation and reconstruction

• Face x in “face space” coordinates:

• Reconstruction:

= +

µ + w1u1+w2u2+w3u3+w4u4+ …

=

^x =


Recognition w/ eigenfacesProcess labeled training images• Find mean µ and covariance matrix Σ

• Find k principal components (eigenvectors of Σ) u1,…uk

• Project each training image xi onto subspace spanned by principal components: (wi1,…,wik) = (u1

Txi, … , ukTxi)

Given novel image x• Project onto subspace: (w1,…,wk) = (u1

Tx, … , ukTx)

• Classify as closest training face in k-dimensional subspace

M. Turk and A. Pentland,

Face Recognition using Eigenfaces,

CVPR 1991


http://www.cs.ucsb.edu/~mturk/Papers/mturk-CVPR91.pdf

Slide credit: Alexander Ihler

Plan for today

• Dimensionality reduction – motivation

• Principal Component Analysis (PCA)

• Applications of PCA

• Other methods for dimensionality reduction

PCA

• General dimensionality reduction technique

• Preserves most of variance with a much more compact representation– Lower storage requirements (eigenvectors + a few

numbers per face)

– Faster matching

• What are some problems?


PCA limitations• The direction of maximum variance is not

always good for classification


PCA limitations

• PCA preserves maximum variance

• A more discriminative subspace:

Fisher Linear Discriminants

• FLD preserves discrimination

– Find projection that maximizes scatter between classes and minimizes scatter within classes


Poor Projection

x1

x2

x1

x2

Using two classes as example:

Good


Fisher’s Linear Discriminant


Comparison with PCA

Other dimensionality reduction methods

• Non-linear:– Kernel PCA (Schölkopf et al., Neural Computation

1998)

– Independent component analysis – Comon, Signal Processing 1994

– LLE (locally linear embedding) – Roweis and Saul, Science 2000

– ISOMAP (isometric feature mapping) – Tenenbaum et al., Science 2000

– t-SNE (t-distributed stochastic neighbor embedding) –van der Maaten and Hinton, JMLR 2008

ISOMAP example

Figure from Carlotta Domeniconi

t-SNE example

Figure from Genevieve Patterson, IJCV 2014

t-SNE example

Thomas and Kovashka, CVPR 2016

cs 2750: machine learningkovashka/cs2750_sp17/ml_04_pca.pdf · cs 2750: machine learning...

Documents