dimensionality reductionpca, svd, mds, ica, and friends
DESCRIPTION
Jure Leskovec Machine Learning recitation April 27 2006TRANSCRIPT
![Page 1: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/1.jpg)
Dimensionality reductionPCA, SVD, MDS, ICA,
and friends
Jure LeskovecMachine Learning recitation
April 27 2006
![Page 2: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/2.jpg)
Why dimensionality reduction? Some features may be irrelevant We want to visualize high dimensional data “Intrinsic” dimensionality may be smaller than
the number of features
![Page 3: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/3.jpg)
Supervised feature selection Scoring features:
Mutual information between attribute and class χ2: independence between attribute and class Classification accuracy
Domain specific criteria: E.g. Text:
remove stop-words (and, a, the, …) Stemming (going go, Tom’s Tom, …) Document frequency
![Page 4: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/4.jpg)
Choosing sets of features Score each feature Forward/Backward elimination
Choose the feature with the highest/lowest score Re-score other features Repeat
If you have lots of features (like in text) Just select top K scored features
![Page 5: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/5.jpg)
Feature selection on text
SVM
kNN
NB
Rochio
![Page 6: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/6.jpg)
Unsupervised feature selection Differs from feature selection in two ways:
Instead of choosing subset of features, Create new features (dimensions) defined as
functions over all features Don’t consider class labels, just the data points
![Page 7: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/7.jpg)
Unsupervised feature selection Idea:
Given data points in d-dimensional space, Project into lower dimensional space while preserving
as much information as possible E.g., find best planar approximation to 3D data E.g., find best planar approximation to 104D data
In particular, choose projection that minimizes the squared error in reconstructing original data
![Page 8: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/8.jpg)
PCA Algorithm PCA algorithm:
1. X Create N x d data matrix, with one row vector xn per data point
2. X subtract mean x from each row vector xn in X 3. Σ covariance matrix of X Find eigenvectors and eigenvalues of Σ PC’s the M eigenvectors with largest eigenvalues
![Page 9: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/9.jpg)
PCA Algorithm in Matlab% generate dataData = mvnrnd([5, 5],[1 1.5; 1.5 3], 100);figure(1); plot(Data(:,1), Data(:,2), '+');%center the datafor i = 1:size(Data,1) Data(i, :) = Data(i, :) - mean(Data);end
DataCov = cov(Data); %covariance matrix[PC, variances, explained] = pcacov(DataCov); %eigen
% plot principal componentsfigure(2); clf; hold on;plot(Data(:,1), Data(:,2), '+b');plot(PC(1,1)*[-5 5], PC(2,1)*[-5 5], '-r’)plot(PC(1,2)*[-5 5], PC(2,2)*[-5 5], '-b’); hold off
% project down to 1 dimensionPcaPos = Data * PC(:, 1);
![Page 10: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/10.jpg)
2d Data
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5-2
0
2
4
6
8
10
![Page 11: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/11.jpg)
Principal Components
-5 -4 -3 -2 -1 0 1 2 3 4 5-5
-4
-3
-2
-1
0
1
2
3
4
5 1st principal vector
2nd principal vector
Gives best axis to project
Minimum RMS error
Principal vectors are orthogonal
![Page 12: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/12.jpg)
How many components? Check the distribution of eigen-values Take enough many eigen-vectors to cover 80-90%
of the variance
![Page 13: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/13.jpg)
Sensor networks
Sensors in Intel Berkeley Lab
![Page 14: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/14.jpg)
Pairwise link quality vs. distance
Distance between a pair of sensors
Link
qua
lity
![Page 15: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/15.jpg)
PCA in action
Given a 54x54 matrix of pairwise link qualities
Do PCA Project down to 2
principal dimensions
PCA discovered the map of the lab
![Page 16: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/16.jpg)
Problems and limitations What if very large dimensional data?
e.g., Images (d ≥ 104)
Problem: Covariance matrix Σ is size (d2) d=104 |Σ| = 108
Singular Value Decomposition (SVD)! efficient algorithms available (Matlab) some implementations find just top N eigenvectors
![Page 17: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/17.jpg)
![Page 18: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/18.jpg)
Singular Value Decomposition Problem:
#1: Find concepts in text #2: Reduce dimensionality
![Page 19: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/19.jpg)
SVD - Definition
A[n x m] = U[n x r] r x r] (V[m x r])T
A: n x m matrix (e.g., n documents, m terms) U: n x r matrix (n documents, r concepts) : r x r diagonal matrix (strength of each
‘concept’) (r: rank of the matrix) V: m x r matrix (m terms, r concepts)
![Page 20: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/20.jpg)
SVD - Properties
THEOREM [Press+92]: always possible to decompose matrix A into A = U VT , where
U, V: unique (*) U, V: column orthonormal (ie., columns are unit
vectors, orthogonal to each other) UTU = I; VTV = I (I: identity matrix)
: singular value are positive, and sorted in decreasing order
![Page 21: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/21.jpg)
SVD - Properties
‘spectral decomposition’ of the matrix:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
= x xu1 u2
1
2
v1
v2
![Page 22: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/22.jpg)
SVD - Interpretation
‘documents’, ‘terms’ and ‘concepts’: U: document-to-concept similarity matrix V: term-to-concept similarity matrix : its diagonal elements: ‘strength’ of each
concept
Projection: best axis to project on: (‘best’ = min sum of
squares of projection errors)
![Page 23: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/23.jpg)
SVD - Example
A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
datainf.retrieval
brain lung
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=CS
MD
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
![Page 24: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/24.jpg)
SVD - Example
A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
datainf.
retrieval
brain lung
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=CS
MD
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
CS-conceptMD-concept
doc-to-concept similarity matrix
![Page 25: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/25.jpg)
SVD - Example
A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
datainf.
retrieval
brain lung
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=CS
MD
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
‘strength’ of CS-concept
![Page 26: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/26.jpg)
SVD - Example
A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
datainf.
retrieval
brain lung
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=CS
MD
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
term-to-conceptsimilarity matrix
CS-concept
![Page 27: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/27.jpg)
SVD – Dimensionality reduction
Q: how exactly is dim. reduction done? A: set the smallest singular values to zero:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
![Page 28: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/28.jpg)
SVD - Dimensionality reduction
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18
0.36
0.18
0.90
0
00
~9.64
x
0.58 0.58 0.58 0 0
x
![Page 29: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/29.jpg)
SVD - Dimensionality reduction
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
~
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 0 0
0 0 0 0 00 0 0 0 0
![Page 30: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/30.jpg)
LSI (latent semantic indexing)
Q1: How to do queries with LSI?A: map query vectors into ‘concept space’ – how?
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
datainf.
retrievalbrainlung
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=CS
MD
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
![Page 31: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/31.jpg)
LSI (latent semantic indexing)
Q: How to do queries with LSI?A: map query vectors into ‘concept space’ – how?
1 0 0 0 0
datainf.
retrievalbrainlung
q=
term1
term2
v1
q
v2
A: inner product (cosine similarity)with each ‘concept’ vector vi
![Page 32: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/32.jpg)
LSI (latent semantic indexing)
compactly, we have:
qconcept = q Ve.g.:
1 0 0 0 0
datainf.
retrievalbrainlung
q=
0.58 0
0.58 0
0.58 0
0 0.71
0 0.71
term-to-concept similarities
= 0.58 0
CS-concept
![Page 33: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/33.jpg)
Multi-lingual IR (English query, on Spanish text?)
Q: multi-lingual IR (english query, on spanish text?)
Problem: given many documents, translated to both
languages (eg., English and Spanish) answer queries across languages
![Page 34: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/34.jpg)
Little example
How would the document (‘information’, ‘retrieval’) handled by LSI? A: SAME:
dconcept = d VEg:
0 1 1 0 0
datainf.
retrievalbrainlung
d=
0.58 0
0.58 0
0.58 0
0 0.71
0 0.71
term-to-concept similarities
= 1.16 0
CS-concept
![Page 35: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/35.jpg)
Little example
Observation: document (‘information’, ‘retrieval’) will be retrieved by query (‘data’), although it does not contain ‘data’!!
0 1 1 0 0
datainf.
retrievalbrainlung
d=1.16 0
CS-concept
1 0 0 0 0
0.58 0
q=
![Page 36: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/36.jpg)
Multi-lingual IR Solution: ~ LSI Concatenate
documents Do SVD on them Now when a new
document comes project it into concept space
Measure similarity in concept spalce
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
datainf.
retrievalbrainlung
CS
MD
1 1 1 0 0
1 2 2 0 0
1 1 1 0 0
5 5 4 0 0
0 0 0 2 2
0 0 0 2 30 0 0 1 1
datosinformacion
![Page 37: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/37.jpg)
Visualization of text Given a set of documents how could we
visualize them over time? Idea:
Perform PCA Project documents down to 2 dimensions See how the cluster centers change – observe the
words in the cluster over time
Example: Our paper with Andreas and Carlos at ICML 2006
![Page 38: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/38.jpg)
eigenvectors and eigenvalues on
graphs
Spectral graph partitioningSpectral clustering
Google’s PageRank
![Page 39: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/39.jpg)
Spectral graph partitioning How do you find communities in graphs?
![Page 40: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/40.jpg)
Spectral graph partitioning Find 2nd eigenvector of graph Laplacian (think of it as
adjacency) matrix Cluster based on 2nd eigevector
![Page 41: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/41.jpg)
Spectral clustering Given learning examples Connect them into a graph (based on similarity) Do spectral graph partitioning
![Page 42: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/42.jpg)
Google/page-rank algorithm Problem:
given the graph of the web find the most ‘authoritative’ web pages for this query
closely related: imagine a particle randomly moving along the edges (*)
compute its steady-state probabilities
(*) with occasional random jumps
![Page 43: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/43.jpg)
Google/page-rank algorithm ~identical problem: given a Markov Chain,
compute the steady state probabilities p1 ... p5
1 2 3
45
![Page 44: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/44.jpg)
(Simplified) PageRank algorithm
Let A be the transition matrix (= adjacency matrix); let AT become column-normalized - then
1 2 3
45
p1
p2
p3
p4
p5
p1
p2
p3
p4
p5
=
To From
1
1 1
1/2 1/2
1/2
1/2
AT p = p
![Page 45: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/45.jpg)
(Simplified) PageRank algorithm AT p = 1 * p thus, p is the eigenvector that corresponds to
the highest eigenvalue (=1, since the matrix is column-
normalized) formal definition of eigenvector/value: soon
![Page 46: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/46.jpg)
PageRank: How do I calculate it fast?
If A is a (n x n) square matrix , x) is an eigenvalue/eigenvector pair of A if A x = x
CLOSELY related to singular values
![Page 47: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/47.jpg)
Power Iteration - Intuition
A as vector transformation
2 11 3
A
10
x
21
x’
= x
x’
2
1
1
3
AT p = p
![Page 48: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/48.jpg)
Power Iteration - Intuition By definition, eigenvectors remain parallel to
themselves (‘fixed points’, A x = x)
2 11 3
A0.52
0.85
v1v1
=
0.52
0.853.62 *
1
![Page 49: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/49.jpg)
Many PCA-like approaches Multi-dimensional scaling (MDS):
Given a matrix of distances between features We want a lower-dimensional representation that best
preserves the distances
Independent component analysis (ICA): Find directions that are most statistically independent
![Page 50: Dimensionality reductionPCA, SVD, MDS, ICA, and friends](https://reader035.vdocuments.site/reader035/viewer/2022081720/557dc70cd8b42a8a188b5502/html5/thumbnails/50.jpg)
Acknowledgements Some of the material is borrowed from lectures
of Christos Faloutsos and Tom Mitchell