pca - easy to understand

8/13/2019 PCA - Easy to Understand

1/26

PCAPrincipal Component Analysis

Dr. Saed SayadUniversity of Toronto

2010

[email protected]

1http://chem-eng.utoronto.ca/~datamining/


2/26

Basic Statistics



3/26

Statistics - Example

X SumX Count Average X-Xbar (X-Xbar)^2 Sxx Variance SD

0 40 4 10 -10 100 208 69.33 8.33

8 -2 4

12 2 4

20 10 100

Y SumY Count Average Y-Ybar (Y-Ybar)^2 Syy Variance SD8 40 4 10 -2 4 10 3.33 1.83

9 -1 1

11 1 1

12 2 4

X Y Count X-Xbar Y-Ybar (X-Xbar)(Y-Ybar) Sxy CoVariance

0 8 4 -10 -2 20 44 14.67

8 9 -2 -1 2

12 11 2 1 2

20 12 10 2 20



4/26

Covariance Matrix



5/26

Matrix Algebra

Example of one non-eigenvector and one eigenvector



6/26

Matrix Algebra

Example of how a scaled eigenvector is still an eigenvector



7/26

Eigenvectors Properties

Eigenvectors can only be found for square matrices and not every squarematrix has eigenvectors.

Given an n x n matrix that does have eigenvectors, there are n of them.

Another property of eigenvectors is that even if we scale the vector bysome amount before we multiply it, we still get the same multiple of it asa result. This is because if you scale a vector by some amount, all you aredoing is making it longer not changing its direction.

Lastly, all the eigenvectors of a matrix areperpendicular. it means that you

can express the data in terms of these perpendicular eigenvectors, insteadof expressing them in terms of thexand y axes.



8/26

Standardized Eigenvectors

We like to find the eigenvectors whose length isexactly one.

This is because, the length of a vector doesnt affectwhether its an eigenvector or not, whereas thedirection does.

So, in order to keep eigenvectors standard, whenever

we find an eigenvector we usually scale it to make ithave a length of 1, so that all eigenvectors have thesame length.



9/26

Standardized Eigenvectors

Eigenvector

Vector Length

Eigenvector with length of one



10/26

Eigenvalues

Eigenvalues is the amount by which the original vector was scaled aftermultiplication by the square matrix.

4 is the eigenvalue associated with that eigenvector in the example.

No matter what multiple of the eigenvector we took before wemultiplied it by the square matrix, we would always get 4 times thescaled vector.

Eigenvectors and Eigenvalues always come in pairs.



11/26

PCA

PCA is a way of identifying patterns in data, and expressingthe data in such a way as to highlight their similarities anddifferences.

Since patterns in data can be hard to find in data of highdimension, where the luxury of graphical representation is notavailable, PCA is a powerful tool for analysing data.

The other main advantage of PCA is that once you have found

these patterns in the data, and you compress the data, ie. byreducing the number of dimensions, without much loss ofinformation. This technique used in image compression.



12/26

PCA Original and Adjusted Data

Original Data Original Data - Average



13/26

PCA Original Data plot



14/26

Calculate Covariance Matrix



15/26

Calculate Eigenvectors and Eigenvalues from

Covariance Matrix



16/26

Eigenvectors Plot



17/26

Choosing components and forming a Feature

Vector

The eigenvector with the highesteigenvalue is theprinciple componentofthe data set. It is the most significant relationship between the datadimensions.

In general, once eigenvectors are found from the covariance matrix, the

next step is to order them by eigenvalue, highest to lowest. This gives usthe components in order of significance.

Now, if we like, we can decide to ignore the components of lessersignificance. We do lose some information, but if the eigenvalues aresmall, we dont lose much.

If we leave out some components, the final data set will have lessdimensions than the original.



18/26

Feature Vector



19/26

Deriving the new data set

Row Feature Vectoris the matrix with the eigenvectors in thecolumns transposed so that the eigenvectors are now in therows, with the most significant eigenvector at the top.

Row Data Adjustis the mean-adjusted data transposed, ie.the data items are in each column, with each row holding aseparate dimension.

Final Data is the final data items in columns, and dimensionsalong rows. It is the original data solely in terms of thevectors.

Final Data = Row Feature VectorXRow Data Adjust



20/26

Deriving the new data set

Two Eigenvectors



21/26

Get the original data back

Row Data Adjust= Row Feature Vector TX Final Data

Row Original Data = (Row Feature Vector TX Final Data) + Original Mean



22/26

Principal Component Regression

PCA + MLR



23/26

PCR

T is a matrix of SCORES

X is a DATA matrix

P is a matrix of LOADNIGS

Y is a dependent variable vector

B is the regression coefficients vector

http://chem-eng.utoronto.ca/~datamining/

23

YTT)T(B

ETBY

TPX

1-


24/26

PCR

In PCR the X matrix is replaced by the T matrixwhich has less and orthogonal variables.

There will be no issue with inversion of the TT

matrix (unlike MLR) because of orthogonalscores.

PCR also resolves the issue of colinearitywhichcould in return reduce the prediction error.

There is no guarantee to have a PCR model thatworks better than an MLR model using the samedataset.

http://chem-eng.utoronto.ca/~datamining/

24


25/26

Reference

www.cs.otago.ac.nz/cosc453/student_tutorial

s/principal_components.pdf

http://chem-eng.utoronto.ca/~datamining/ 25


26/26

Questions?


pca - easy to understand

Documents