principal component analysis
DESCRIPTION
Principal Component Analysis. Machine Learning. Last Time. Expectation Maximization in Graphical Models Baum Welch. Now. Unsupervised Dimensionality Reduction. Curse of Dimensionality. In (nearly) all modeling approaches, more features (dimensions) require (a lot) more data - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/1.jpg)
Principal Component Analysis
Machine Learning
![Page 2: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/2.jpg)
Last Time
• Expectation Maximization in Graphical Models– Baum Welch
![Page 3: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/3.jpg)
Now
• Unsupervised Dimensionality Reduction
![Page 4: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/4.jpg)
Curse of Dimensionality
• In (nearly) all modeling approaches, more features (dimensions) require (a lot) more data – Typically exponential in the number of features
• This is clearly seen from filling a probability table.
• Topological arguments are also made.– Compare the volume of an inscribed hypersphere
to a hypercube
![Page 5: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/5.jpg)
Dimensionality Reduction
• We’ve already seen some of this.
• Regularization attempts to reduce the number of effective features used in linear and logistic regression classifiers
![Page 6: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/6.jpg)
Linear Models
• When we regularize, we optimize a function that ignores as many features as possible.
• The “effective” number of dimensions is much smaller than D
![Page 7: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/7.jpg)
Support Vector Machines
• In exemplar approaches (SVM, k-nn) each data point can be considered to describe a dimension.
• By selecting only those instances that maximize the margin (setting α to zero), SVMs use only a subset of available dimensions in their decision making.
![Page 8: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/8.jpg)
Decision Trees
• Decision Trees explicitly select split points based on features that improve InformationGain or Accuracy
• Features that don’t contribute to the classification sufficiently are never used.
weight
<165
5M height
<68
5F 1F / 1M
![Page 9: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/9.jpg)
Feature Spaces
• Even though a data point is described in terms of N features, this may not be the most compact representation of the feature space
• Even classifiers that try to use a smaller effective feature space can suffer from the curse-of-dimensionality
• If a feature has some discriminative power, the dimension may remain in the effective set.
![Page 10: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/10.jpg)
1-d data in a 2-d world
0 0.020.040.060.08 0.1 0.120.14249.6249.8
250250.2250.4250.6250.8
251251.2251.4
![Page 11: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/11.jpg)
Dimensions of high variance
![Page 12: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/12.jpg)
Identifying dimensions of variance
• Assumption: directions that show high variance represent the appropriate/useful dimension to represent the feature set.
![Page 13: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/13.jpg)
Aside: Normalization
• Assume 2 features:– Percentile GPA– Height in cm.
• Which dimension shows greater variability?
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1235
240
245
250
255
260
265
270
275
280
285
![Page 14: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/14.jpg)
Aside: Normalization
• Assume 2 features:– Percentile GPA– Height in cm.
• Which dimension shows greater variability?
0 5 10 15 20 25 30235
240
245
250
255
260
265
270
275
280
285
![Page 15: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/15.jpg)
Aside: Normalization
• Assume 2 features:– Percentile GPA– Height in m.
• Which dimension shows greater variability?
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 16: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/16.jpg)
Principal Component Analysis
• Principal Component Analysis (PCA) identifies the dimensions of greatest variance of a set of data.
![Page 17: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/17.jpg)
Eigenvectors
• Eigenvectors are orthogonal vectors that define a space, the eigenspace.
• Any data point can be described as a linear combination of eigenvectors.
• Eigenvectors of a square matrix have the following property.
• The associated lambda is the eigenvalue.
![Page 18: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/18.jpg)
PCA
• Write each data point in this new space
• To do the dimensionality reduction, keep C < D dimensions.
• Each data point is now represented as a vector of c’s.
![Page 19: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/19.jpg)
Identifying Eigenvectors
• PCA is easy once we have eigenvectors and the mean.
• Identifying the mean is easy.• Eigenvectors of the covariance matrix,
represent a set of direction of variance.• Eigenvalues represent the degree of the
variance.
![Page 20: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/20.jpg)
Eigenvectors of the Covariance Matrix
• Eigenvectors are orthonormal• In the eigenspace, the Gaussian is diagonal – zero
covariance.• All eigen values are non-negative.• Eigenvalues are sorted.• Larger eigenvalues, higher variance
![Page 21: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/21.jpg)
Dimensionality reduction with PCA
• To convert from an original data point to PCA
• To reconstruct a point
![Page 22: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/22.jpg)
Eigenfaces
Encoded then Decoded.
Efficiency can be evaluatedwith Absolute or Squared error
![Page 23: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/23.jpg)
Some other (unsupervised) dimensionality reduction techniques
• Kernel PCA• Distance Preserving Dimension Reduction• Maximum Variance Unfolding• Multi Dimensional Scaling (MDS)• Isomap
![Page 24: Principal Component Analysis](https://reader035.vdocuments.site/reader035/viewer/2022062316/56816750550346895ddc019e/html5/thumbnails/24.jpg)
• Next Time– Model Adaptation and Semi-supervised
Techniques• Work on your projects.