cs 2750: machine learningkovashka/cs2750_sp17/ml_04_pca.pdf · cs 2750: machine learning...

39
CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Upload: others

Post on 15-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

CS 2750: Machine Learning

Dimensionality Reduction

Prof. Adriana KovashkaUniversity of Pittsburgh

January 19, 2017

Page 2: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Plan for today

• Dimensionality reduction – motivation

• Principal Component Analysis (PCA)

• Applications of PCA

• Other methods for dimensionality reduction

Page 3: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Why reduce dimensionality?

• Data may intrinsically live in a lower-dim space

• Too many features and too few data

• Lower computational expense (memory, train/test time)

• Want to visualize the data in a lower-dim space

• Want to use data of different dimensionality

Page 4: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Goal

• Input: Data in a high-dim feature space

• Output: Projection of same data into a lower-dim space

• F: high-dim X low-dim X

Page 5: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Goal

Slide credit: Erik Sudderth

Page 6: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Some criteria for success

• Find a projection where the data has:

– Low reconstruction error

– High variance of the data

See hand-written notes for how we find the optimal projection

Page 7: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Slide credit: Subhransu Maji

Principal Components Analysis

Page 8: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Demo

• http://www.cs.pitt.edu/~kovashka/cs2750_sp17/PCA_demo.m

• http://www.cs.pitt.edu/~kovashka/cs2750_sp17/PCA.m

• Demo with eigenfaces: http://www.cs.ait.ac.th/~mdailey/matlab/

Page 9: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

• Covariance matrix is huge (D2 for D pixels)

• But typically # examples N << D

• Simple trick– X is NxD matrix of normalized training data

– Solve for eigenvectors u of XXT instead of XTX

– Then Xu is eigenvector of covariance XTX

– Need to normalize each vector of Xu into unit length

Adapted from Derek Hoiem

Implementation issue

Page 10: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

How to pick K?

• One goal can be to pick K such that P% of the variance of the data is preserved, e.g. 90%

• Let Λ = a vector containing the eigenvalues of the covariance matrix

• Total variance can be obtained from entries of Λ

– total_variance = sum(Λ);

• Take as many of these entries as needed

– K = find( cumsum(Λ) / total_variance >= P, 1);

Page 11: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Variance preserved at i-th eigenvalue

Figure 12.4 (a) from Bishop

Page 12: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Application: Face Recognition

Image from cnet.com

Page 13: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Face recognition: once you’ve detected and cropped a face, try to recognize it

Detection Recognition “Sally”

Slide credit: Lana Lazebnik

Page 14: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Typical face recognition scenarios

• Verification: a person is claiming a particular identity; verify whether that is true– E.g., security

• Closed-world identification: assign a face to one person from among a known set

• General identification: assign a face to a known person or to “unknown”

Slide credit: Derek Hoiem

Page 15: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

The space of all face images• When viewed as vectors of pixel values, face images are

extremely high-dimensional– 24x24 image = 576 dimensions

– Slow and lots of storage

• But very few 576-dimensional vectors are valid face images

• We want to effectively model the subspace of face images

Adapted from Derek Hoiem

Page 16: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Representation and reconstruction

• Face x in “face space” coordinates:

• Reconstruction:

= +

µ + w1u1+w2u2+w3u3+w4u4+ …

=

^x =

Slide credit: Derek Hoiem

Page 17: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Recognition w/ eigenfacesProcess labeled training images• Find mean µ and covariance matrix Σ

• Find k principal components (eigenvectors of Σ) u1,…uk

• Project each training image xi onto subspace spanned by principal components: (wi1,…,wik) = (u1

Txi, … , ukTxi)

Given novel image x• Project onto subspace: (w1,…,wk) = (u1

Tx, … , ukTx)

• Classify as closest training face in k-dimensional subspace

M. Turk and A. Pentland,

Face Recognition using Eigenfaces,

CVPR 1991

Adapted from Derek Hoiem

Page 18: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Slide credit: Alexander Ihler

Page 19: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Slide credit: Alexander Ihler

Page 20: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Slide credit: Alexander Ihler

Page 21: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Slide credit: Alexander Ihler

Page 22: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Slide credit: Alexander Ihler

Page 23: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Slide credit: Alexander Ihler

Page 24: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Slide credit: Alexander Ihler

Page 25: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Slide credit: Alexander Ihler

Page 26: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Slide credit: Alexander Ihler

Page 27: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Slide credit: Alexander Ihler

Page 28: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Plan for today

• Dimensionality reduction – motivation

• Principal Component Analysis (PCA)

• Applications of PCA

• Other methods for dimensionality reduction

Page 29: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

PCA

• General dimensionality reduction technique

• Preserves most of variance with a much more compact representation– Lower storage requirements (eigenvectors + a few

numbers per face)

– Faster matching

• What are some problems?

Slide credit: Derek Hoiem

Page 30: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

PCA limitations• The direction of maximum variance is not

always good for classification

Slide credit: Derek Hoiem

Page 31: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

PCA limitations

• PCA preserves maximum variance

• A more discriminative subspace:

Fisher Linear Discriminants

• FLD preserves discrimination

– Find projection that maximizes scatter between classes and minimizes scatter within classes

Adapted from Derek Hoiem

Page 32: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Poor Projection

x1

x2

x1

x2

Using two classes as example:

Good

Slide credit: Derek Hoiem

Fisher’s Linear Discriminant

Page 33: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Slide credit: Derek Hoiem

Comparison with PCA

Page 34: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

Other dimensionality reduction methods

• Non-linear:– Kernel PCA (Schölkopf et al., Neural Computation

1998)

– Independent component analysis – Comon, Signal Processing 1994

– LLE (locally linear embedding) – Roweis and Saul, Science 2000

– ISOMAP (isometric feature mapping) – Tenenbaum et al., Science 2000

– t-SNE (t-distributed stochastic neighbor embedding) –van der Maaten and Hinton, JMLR 2008

Page 35: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

ISOMAP example

Figure from Carlotta Domeniconi

Page 36: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

ISOMAP example

Figure from Carlotta Domeniconi

Page 37: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

t-SNE example

Figure from Genevieve Patterson, IJCV 2014

Page 38: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

t-SNE example

Thomas and Kovashka, CVPR 2016

Page 39: CS 2750: Machine Learningkovashka/cs2750_sp17/ml_04_pca.pdf · CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 19, 2017

t-SNE example

Thomas and Kovashka, CVPR 2016