dimension reduction - visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...feb 12, 2013...

85
Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul Choo

Upload: others

Post on 06-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Dimension Reduction

CSE 6242 A / CS 4803 DVAFeb 12, 2013

Guest Lecturer: Jaegul Choo

Page 2: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Dimension Reduction

CSE 6242 A / CS 4803 DVAFeb 12, 2013

Guest Lecturer: Jaegul Choo

Page 3: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Data is Too Big To Do Something..

Limited memory sizeData may not be fitted to the memory of your machine

Slow computation106-dim vs. 10-dim vectors for Euclidean distance computation

Page 4: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Two Axes of Data Set

No. of data itemsHow many data items?

No. of dimensionsHow many dimensions representing each item?

Data item index

Dimension index

Page 5: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Two Axes of Data Set

No. of data itemsHow many data items?

No. of dimensionsHow many dimensions representing each item?

Data item index

Dimension index

Page 6: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Two Axes of Data Set

No. of data itemsHow many data items?

No. of dimensionsHow many dimensions representing each item?

Data item index

Dimension index

Columns as data items

Page 7: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Two Axes of Data Set

No. of data itemsHow many data items?

No. of dimensionsHow many dimensions representing each item?

Data item index

Dimension index

Columns as data items vs. Rows as data items

Page 8: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Two Axes of Data Set

No. of data itemsHow many data items?

No. of dimensionsHow many dimensions representing each item?

Data item index

Dimension index

Columns as data items vs. Rows as data items

We will use this during lecture

Page 9: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Dimension ReductionLet’s Reduce Data (along Dimension Axis)

Page 10: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Dimension ReductionLet’s Reduce Data (along Dimension Axis)

Dimension Reduction

High-dim data

low-dim data

Page 11: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Dimension ReductionLet’s Reduce Data (along Dimension Axis)

Dimension Reduction

High-dim data

low-dim data

No. of

: user-specified

Page 12: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Dimension ReductionLet’s Reduce Data (along Dimension Axis)

Dimension Reduction

High-dim data

low-dim data

No. of

Other

: user-specified

Page 13: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Dimension ReductionLet’s Reduce Data (along Dimension Axis)

Dimension Reduction

High-dim data

low-dim data

No. of

Additional info about data

Other

: user-specified

Page 14: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Dimension ReductionLet’s Reduce Data (along Dimension Axis)

Dimension Reduction

High-dim data

low-dim data

No. of

Additional info about data

Other Dim-reducing Transformer for a

new data

: user-specified

Page 15: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

What You Get from DR

Obviously, Less storageFaster computation

Page 16: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

What You Get from DR

Obviously, Less storageFaster computation

More importantly,

Page 17: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

What You Get from DR

Obviously, Less storageFaster computation

More importantly, Noise removal (improving quality of data)

Leads better performance for tasks

Page 18: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

What You Get from DR

Obviously, Less storageFaster computation

More importantly, Noise removal (improving quality of data)

Leads better performance for tasks2D/3D representation

Enables visual data exploration

Page 19: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Applications

Traditionally, Microarray data analysisInformation retrievalFace recognitionProtein disorder predictionNetwork intrusion detectionDocument categorizationSpeech recognition

More interestingly, Interactive visualization of high-dimensional data

Page 20: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Face RecognitionVector Representation of Images

Images → serialized/rasterized pixel values

Dimensions can be huge.640x480 size: 307,200 dimensions

3 80

2458

63

45

3

80

24

58

63

45

Page 21: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Face Recognition

Page 22: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Face Recognition

Vector Representation of

Images

Color = PersonColumn = Image

Page 23: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Face Recognition

Vector Representation of

Images

Color = PersonColumn = Image

Page 24: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Face Recognition

Vector Representation of

Images

Color = PersonColumn = Image

Page 25: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Face Recognition

Vector Representation of

Images

Dimension Reduction

PCA, LDA, etc.

Color = PersonColumn = Image

Page 26: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Face Recognition

Vector Representation of

Images

Classification on dim-reduced data

Dimension Reduction

PCA, LDA, etc.

→  Better accuracy than on original-dim data Color = Person

Column = Image

Page 27: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Document Retrieval

D1 D2 …

I 1 1 …

like 1 0 …

hate 0 2 …

data 1 1 …

… … …

Page 28: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Document Retrieval

Latent semantic indexing

D1 D2 …

I 1 1 …

like 1 0 …

hate 0 2 …

data 1 1 …

… … …

Page 29: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Document Retrieval

Latent semantic indexingTerm-document matrix via bag-of-words model

D1 = “I like data”D2 = “I hate hate data”

D1 D2 …

I 1 1 …

like 1 0 …

hate 0 2 …

data 1 1 …

… … …

Page 30: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Document Retrieval

Latent semantic indexingTerm-document matrix via bag-of-words model

D1 = “I like data”D2 = “I hate hate data”

Dimensions can be hundreds of thousandsi.e., #distinct words

D1 D2 …

I 1 1 …

like 1 0 …

hate 0 2 …

data 1 1 …

… … …

Page 31: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Document Retrieval

Latent semantic indexingTerm-document matrix via bag-of-words model

D1 = “I like data”D2 = “I hate hate data”

Dimensions can be hundreds of thousandsi.e., #distinct words

D1 D2 …

I 1 1 …

like 1 0 …

hate 0 2 …

data 1 1 …

… … …

Dimension Reduction

D1 D2 …

Dim1 1.75 -0.27 …

Dim2 -0.21 0.58 …

Dim3 1.32 0.25

Page 32: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Document Retrieval

Latent semantic indexingTerm-document matrix via bag-of-words model

D1 = “I like data”D2 = “I hate hate data”

Dimensions can be hundreds of thousandsi.e., #distinct words

D1 D2 …

I 1 1 …

like 1 0 …

hate 0 2 …

data 1 1 …

… … …

Dimension Reduction

D1 D2 …

Dim1 1.75 -0.27 …

Dim2 -0.21 0.58 …

Dim3 1.32 0.25

→    Search-Retrieval on dim-reduced data leads to better semantics

Page 33: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

VisualizationVisualizing “Map of Science”

http://www.mapofscience.com

Page 34: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Two Main Techniques

1. Feature selectionSelects a subset of the original variables as reduced dimensionsFor example, the number of genes responsible for a particular disease may be small

2. Feature extractionEach reduced dimension involves multiple original dimensionsActive area of research recently

Note that Feature = Variable = Dimension

Page 35: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Feature Selection

What are the optimal subset of m features to maximize a given criterion?

Widely-used criteriaInformation gain, correlation, …

Typically combinatorial optimization problems Therefore, greedy methods are popular

Forward selection: Empty set → add one variable at a timeBackward elimination: Entire set → remove one variable at a time

Page 36: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

From now on, we will only discuss about feature extraction

Page 37: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

From now on, we will only discuss about feature extraction

Page 38: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DR

Linear vs. NonlinearUnsupervised vs. SupervisedGlobal vs. LocalFeature vectors vs. Similarity (as an input)

Page 39: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRLinear vs. Nonlinear

LinearRepresents each reduced dimension as a linear combination of original dimensions

e.g., Y1 = 3*X1 – 4*X2 + 0.3*X3 – 1.5*X4, Y2 = 2*X1 + 3.2*X2 – X3 + 2*X4

Naturally capable of mapping new data to the same space

Dimension Reduction

D1 D2

X1 1 1

X2 1 0

X3 0 2

X4 1 1

D1 D2

Y1 1.75 -0.27

Y2 -0.21 0.58

Page 40: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRLinear vs. Nonlinear

LinearRepresents each reduced dimension as a linear combination of original dimensions

e.g., Y1 = 3*X1 – 4*X2 + 0.3*X3 – 1.5*X4, Y2 = 2*X1 + 3.2*X2 – X3 + 2*X4

Naturally capable of mapping new data to the same space

NonlinearMore complicated, but generally more powerfulRecently popular topics

Page 41: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRUnsupervised vs. Supervised

UnsupervisedUses only the input data

Dimension Reduction

High-dim data

low-dim data

No. of

Other Dim-reducing Transformer for a

new data

Additional info about data

Page 42: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRUnsupervised vs. Supervised

SupervisedUses the input data + additional info

Dimension Reduction

High-dim data

low-dim data

No. of

Other Dim-reducing Transformer for a

new data

Additional info about data

Page 43: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRUnsupervised vs. Supervised

SupervisedUses the input data + additional info

e.g., grouping label

Dimension Reduction

High-dim data

low-dim data

No. of

Additional info about data Other Dim-reducing

Transformer for a new data

Page 44: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRGlobal vs. Local

Dimension reduction typically tries to preserve all the relationships/distances in data

Information loss is unavoidable!e.g., PCA

Page 45: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRGlobal vs. Local

Dimension reduction typically tries to preserve all the relationships/distances in data

Information loss is unavoidable!e.g., PCA

Page 46: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRGlobal vs. Local

Dimension reduction typically tries to preserve all the relationships/distances in data

Information loss is unavoidable!

Then, what would you care about?

Global

Page 47: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRGlobal vs. Local

Dimension reduction typically tries to preserve all the relationships/distances in data

Information loss is unavoidable!

Then, what would you care about?

GlobalTreats all pairwise distances equally important

Tends to care larger distances more

Page 48: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRGlobal vs. Local

Dimension reduction typically tries to preserve all the relationships/distances in data

Information loss is unavoidable!

Then, what would you care about?

GlobalTreats all pairwise distances equally important

Tends to care larger distances moreLocal

Page 49: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRGlobal vs. Local

Dimension reduction typically tries to preserve all the relationships/distances in data

Information loss is unavoidable!

Then, what would you care about?

GlobalTreats all pairwise distances equally important

Tends to care larger distances moreLocal

Focuses on small distances, neighborhood relationships

Page 50: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRGlobal vs. Local

Dimension reduction typically tries to preserve all the relationships/distances in data

Information loss is unavoidable!

Then, what would you care about?

GlobalTreats all pairwise distances equally important

Tends to care larger distances moreLocal

Focuses on small distances, neighborhood relationshipsActive research area a.k.a. manifold learning

Page 51: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRFeature vectors vs. Similarity (as an input)

Dimension Reduction

High-dim data

low-dim data

No. of

Other Dim-reducing Transformer for a

new data

Additional info about data

Typical setup (feature vectors as an input)

Page 52: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRFeature vectors vs. Similarity (as an input)

Dimension Reduction

Similarity matrix

low-dim data

No. of

Other Dim-reducing Transformer for a

new data

Additional info about data

Typical setup (feature vectors as an input)Some methods take similarity matrix instead

(i,j)-th component indicates how similar i-th and j-th data items are

Page 53: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRFeature vectors vs. Similarity (as an input)

Dimension Reduction

low-dim data

No. of

Other Dim-reducing Transformer for a

new data

Additional info about data

Similarity matrix

Page 54: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRFeature vectors vs. Similarity (as an input)Typical setup (feature vectors as an input)Some methods take similarity matrix insteadSome methods internally converts feature vectors to similarity matrix before performing dimension reduction

Similarity matrix

High-dim data

Dimension Reduction

low-dim data

Page 55: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRFeature vectors vs. Similarity (as an input)Typical setup (feature vectors as an input)Some methods take similarity matrix insteadSome methods internally converts feature vectors to similarity matrix before performing dimension reduction

Similarity matrix

High-dim data

Dimension Reduction

low-dim data

a.k.a. Graph Embedding

Page 56: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Aspects of DRFeature vectors vs. Similarity (as an input)

Why called graph embedding?Similarity matrix can be viewed as a graph where similarity represents edge weight

Similarity matrix

High-dim data

Dimension Reduction

low-dim data

a.k.a. Graph Embedding

Page 57: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Methods

TraditionalPrincipal component analysis (PCA)Multidimensional scaling (MDS)Linear discriminant analysis (LDA)Nonnegative matrix factorization (NMF)

Advanced (nonlinear, kernel, manifold learning) Isometric feature mapping (Isomap)Locally linear embedding (LLE)Laplacian Eigenmaps (LE)Kernel PCAt-distributed stochastic neighborhood embedding (t-SNE)

* Matlab codes are available athttp://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html

Page 58: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Principal Component Analysis

Finds the axis showing the greatest variation, and project all points into this axisReduced dimensions are orthogonalAlgorithm: eigen-decompositionPros: FastCons: basic limited performances

http://en.wikipedia.org/wiki/Principal_component_analysis

PC1PC2

Page 59: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Principal Component Analysis

Finds the axis showing the greatest variation, and project all points into this axisReduced dimensions are orthogonalAlgorithm: eigen-decompositionPros: FastCons: basic limited performances

http://en.wikipedia.org/wiki/Principal_component_analysis

PC1PC2Linear

Page 60: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Principal Component Analysis

Finds the axis showing the greatest variation, and project all points into this axisReduced dimensions are orthogonalAlgorithm: eigen-decompositionPros: FastCons: basic limited performances

http://en.wikipedia.org/wiki/Principal_component_analysis

PC1PC2LinearUnsupervised

Page 61: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Principal Component Analysis

Finds the axis showing the greatest variation, and project all points into this axisReduced dimensions are orthogonalAlgorithm: eigen-decompositionPros: FastCons: basic limited performances

http://en.wikipedia.org/wiki/Principal_component_analysis

PC1PC2LinearUnsupervisedGlobal

Page 62: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Principal Component Analysis

Finds the axis showing the greatest variation, and project all points into this axisReduced dimensions are orthogonalAlgorithm: eigen-decompositionPros: FastCons: basic limited performances

http://en.wikipedia.org/wiki/Principal_component_analysis

PC1PC2LinearUnsupervisedGlobalFeature vectors

Page 63: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Principal Component AnalysisDocument Visualization

Page 64: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Principal Component AnalysisTestbed Demo – Text Data

Page 65: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Multidimensional Scaling (MDS)

IntuitionTries to preserve given ideal pairwise distances in low-dimensional space

Metric MDSPreserves given ideal distance values

Nonmetric MDSWhen you only know/care about ordering of distances Preserves only the orderings of distance values

Algorithm: gradient-decent typec.f. classical MDS is the same as PCA

Page 66: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Multidimensional Scaling (MDS)

IntuitionTries to preserve given ideal pairwise distances in low-dimensional space

Metric MDSPreserves given ideal distance values

Nonmetric MDSWhen you only know/care about ordering of distances Preserves only the orderings of distance values

Algorithm: gradient-decent typec.f. classical MDS is the same as PCA

ideal distanceactual distance

Page 67: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Multidimensional Scaling (MDS)

IntuitionTries to preserve given ideal pairwise distances in low-dimensional space

Metric MDSPreserves given ideal distance values

Nonmetric MDSWhen you only know/care about ordering of distances Preserves only the orderings of distance values

Algorithm: gradient-decent typec.f. classical MDS is the same as PCA

Nonlinear

ideal distanceactual distance

Page 68: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Multidimensional Scaling (MDS)

IntuitionTries to preserve given ideal pairwise distances in low-dimensional space

Metric MDSPreserves given ideal distance values

Nonmetric MDSWhen you only know/care about ordering of distances Preserves only the orderings of distance values

Algorithm: gradient-decent typec.f. classical MDS is the same as PCA

NonlinearUnsupervised

ideal distanceactual distance

Page 69: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Multidimensional Scaling (MDS)

IntuitionTries to preserve given ideal pairwise distances in low-dimensional space

Metric MDSPreserves given ideal distance values

Nonmetric MDSWhen you only know/care about ordering of distances Preserves only the orderings of distance values

Algorithm: gradient-decent typec.f. classical MDS is the same as PCA

NonlinearUnsupervisedGlobal

ideal distanceactual distance

Page 70: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Multidimensional Scaling (MDS)

IntuitionTries to preserve given ideal pairwise distances in low-dimensional space

Metric MDSPreserves given ideal distance values

Nonmetric MDSWhen you only know/care about ordering of distances Preserves only the orderings of distance values

Algorithm: gradient-decent typec.f. classical MDS is the same as PCA

NonlinearUnsupervisedGlobalSimilarity input

ideal distanceactual distance

Page 71: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Multidimensional ScalingSammon’s mapping

Sammon’s mappingLocal version of MDS Down-weights errors in large distances

Algorithm: gradient-decent type

NonlinearUnsupervisedLocalSimilarity input

Page 72: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Multidimensional ScalingForce-directed graph layout

Force-directed graph layoutRooted from graph visualization, but essentially variant of metric MDSSpring-like attractive + repulsive forces between nodesAlgorithm: gradient-decent type

Widely-used in visualizationAesthetically pleasing resultsSimple and intuitiveInteractivity

NonlinearUnsupervisedGlobalSimilarity input

Page 74: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Multidimensional Scaling

In all variants, Pros: widely-used (works well in general)Cons: slow

Nonmetric MDS is even much slower than metric MDSNonlinearUnsupervisedGlobalSimilarity input

Page 75: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Linear Discriminant Analysis

Maximally separates clusters byPutting different cluster as far as possiblePutting each cluster as compact as possible

(a) (b)

Page 76: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Linear Discriminant Analysisvs. Principal Component Analysis

2D visualization of 7 Gaussian mixture of 1000 dimensions

Linear discriminant analysis(Supervised)

Principal component analysis(Unsupervised)36

Page 77: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Linear Discriminant Analysis

Maximally separates clusters byPutting different cluster as far as possiblePutting each cluster as compact as possible

Algorithm: generalized eigendecompositionPros: better show cluster structureCons: may distort original relationship of data

Page 78: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Linear Discriminant Analysis

Maximally separates clusters byPutting different cluster as far as possiblePutting each cluster as compact as possible

Algorithm: generalized eigendecompositionPros: better show cluster structureCons: may distort original relationship of data

Linear

Page 79: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Linear Discriminant Analysis

Maximally separates clusters byPutting different cluster as far as possiblePutting each cluster as compact as possible

Algorithm: generalized eigendecompositionPros: better show cluster structureCons: may distort original relationship of data

LinearSupervised

Page 80: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Linear Discriminant Analysis

Maximally separates clusters byPutting different cluster as far as possiblePutting each cluster as compact as possible

Algorithm: generalized eigendecompositionPros: better show cluster structureCons: may distort original relationship of data

LinearSupervisedGlobal

Page 81: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Linear Discriminant Analysis

Maximally separates clusters byPutting different cluster as far as possiblePutting each cluster as compact as possible

Algorithm: generalized eigendecompositionPros: better show cluster structureCons: may distort original relationship of data

LinearSupervisedGlobalFeature vectors

Page 82: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Linear Discriminant AnalysisTestbed Demo – Text Data

Page 83: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Nonnegative Matrix Factorization

Dimension reduction via matrix factorization

Why nonnegativity constraints?Better approximation vs. better interpretationOften physically/semantically meaningful

Algorithm: alternating nonnegativity-constrained least squares

~= èmin || A – WH ||F

W>=0, H>=0

A

H

W

Page 84: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

Nonnegative Matrix Factorizationas clustering

Dimension reduction via matrix factorization

Often NMF performs better and faster than k-meansW: centroids, H: soft-clustering membership

~= èmin || A – WH ||F

W>=0, H>=0

A

H

W

Page 85: Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013  · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

In the next lecture..

More interesting topics coming up includingAdvanced methods

Isomap, LLE, kernel PCA, t-SNE, …Real-world applications in interactive visualizationPractitioners’ guide

What to try first in which situations?