overview and implementation of principal component analysis
TRANSCRIPT
AReviewandImplementationofPrincipalComponentAnalysis 1
A Review and Implementation of Principal Component Analysis
Taweh Beysolow II
Professor Moretti
Fordham University
AReviewandImplementationofPrincipalComponentAnalysis 2
Abstract
In this experiment, we shall look at the famous iris data set and perform principal
component analysis on the data. We want to see which are the principal components that
explain the most variance within the data set. Furthermore, we will discuss the
application of principal component analysis in conjunction with other data analysis
techniques. All computations were performed in Python and all data is uploaded from the
UCI machine learning repository. Rather that simply using the built in PCA function, we
shall implement principal component analysis by manually performing each step, with
assistance from packages for Eigen-decomposition. In conclusion, we find that first two
principal components, out of four in total, explain roughly 96% of the variance in the data.
I. What is Principal Component Analysis?
Principal component analysis (PCA) is a orthogonal linear transformation of data, in
which the transformed data is projected onto a new coordinate plane. This transformed
data is displayed in such a manner that the first coordinate is the location of the greatest
variance, and every subsequent variance is placed on the coordinate system in a
decreasing fashion. These coordinates themselves are the principal components of the
data. The primary purpose of principal component analysis is “to reduce the
dimensionality of a data set consisting of a large number of interrelated variables, while
retaining as much as possible of the variation present in the data set.” (Wood, pg.2)
AReviewandImplementationofPrincipalComponentAnalysis 3
II. Notation
𝑥 = 𝑣𝑒𝑐𝑡𝑜𝑟 𝑜𝑓 𝑝 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 , 𝛼! = 𝑣𝑒𝑐𝑡𝑜𝑟 𝑜𝑓 𝑝 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠
𝛼!! 𝑥 = Σ!
!𝛼!"𝑥! , Σ = 𝐶𝑜𝑣.𝑚𝑎𝑡𝑟𝑖𝑥 𝑓𝑜𝑟 𝑥, 𝑟𝑒𝑝𝑙𝑎𝑐𝑒𝑑 𝑏𝑦 𝑆 𝑠𝑎𝑚𝑝𝑙𝑒 𝑐𝑜𝑣.𝑚𝑎𝑡𝑟𝑖𝑥
𝜆! = 𝑃𝑟𝑖𝑛𝑐𝑖𝑝𝑎𝑙 𝐶𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡 𝑘, 𝑘 = (1,2,… ,𝑝)
III. Derivation of Principal Component Analysis
Our goal is to find the linear function of random variables from the x vector with the
vector of constants from the alpha vector with the maximum variance. This linear
function produces our principal components. Be this as it may, each principal component
must be in order of decreasing variance, and each principal component must be
uncorrelated with each other.
Objective:
𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑉𝑎𝑟 𝛼!! 𝑥 = 𝛼!! Σ𝛼!
We seek to used constrained optimization, as without a constraint the value
of 𝛼! could be infinitely large. As such, we shall choose the following normalization
constraint:
𝛼!! 𝛼! = 1
This brings us to the concept of Lagrange multipliers, which shall be the method
by which we achieve this constrained optimization.
AReviewandImplementationofPrincipalComponentAnalysis 4
Lagrange Multipliers in PCA
The Lagrange Multiplier method is a tool “for constrained optimization of
differentiable functions, especially for nonlinear constrained optimization.”(Huijuan,
pg.1) In particular, this is helpful for finding local maxima and minima of a respective
function subject to a given constraint. Within the context of the experiment, the Lagrange
multipliers are applied as follows:
𝛼!! Σ𝛼! − 𝜆 𝛼!𝛼! − 1
𝑑𝑑𝛼!
𝛼!! Σ𝛼! − 𝜆 𝛼!𝛼! − 1 = 0
Σ𝛼! − 𝜆𝛼! = 0
Σ𝛼! = 𝜆!𝛼!
The final step of the equation yields us the eigenvector 𝛼! with its corresponding
eigenvalue 𝜆!.
What are Eigenvalues and Eigenvectors?
An eigenvalue is a number derived from a square matrix, which corresponds to a
specific eigenvector, also associated with a square matrix. Together, they “provide the
Eigen-decomposition of a matrix.” (Abdi, pg.1) Plainly spoken, the Eigen-decomposition
of a matrix merely provides the matrix in the form of eigenvectors and their
corresponding eigenvalues. Eigen-decomposition is important because it is a “method by
which we can find the maximum (or minimum) of functions involving matrices.” (Abdi,
pg.1) In this context, this is the method by which we find the principal components in
order of decreasing variance.
AReviewandImplementationofPrincipalComponentAnalysis 5
Eigen-decomposition
𝐴𝑢 = 𝜆𝑢
𝐴 − 𝜆𝐼 𝑢 = 0
Where
A = square matrix,
u = eigenvector to matrix A (if length of vector changes when multiplied by A)
Assume that
𝐴 = 2 32 1 ,𝑇ℎ𝑒𝑟𝑒𝑟𝑓𝑜𝑟𝑒
𝑢! = 32 , 𝜆! = 4
𝑢! = −11 , 𝜆! = −1
For most applications, the eigenvectors are normalized to a unit vector as such:
𝑢!𝑢 = 1
Eigenvectors of A furthermore are put together together in a matrix U. each column of U
is an eigenvector of A. The eigenvalues are stored in a diagonal matrix ⋀, where the trace,
or diagonal, of the matrix gives the eigenvalues. Thus, we rewrite the first equation
accordingly:
𝐴𝑈 = 𝑈𝐴
𝐴 = 𝑈⋀𝑈!!
= 3 −12 1 4 0
0 −12 2−4 6
= 2 32 1
AReviewandImplementationofPrincipalComponentAnalysis 6
Moving forward, as we have mentioned prior, our objective is the maximize 𝜆!, and with
the eigenvectors defined in decreasing order. If 𝜆! is the largest eigenvector, then the first
principal component is defined as
Σ𝛼! = 𝜆!𝛼!
In general, we define a given eigenvector as the k-th principal component of x and that
the variance of a given eigenvector is denoted by its corresponding eigenvalue. We shall
now demonstrate this process when k = 2 and when k > 2.
2nd and K-th Principal Component
The second principal component maximizes the variance subject to being
uncorrelated with the first principal component The non-correlation constraint is
expressed as the following:
𝑐𝑜𝑣 𝛼!!𝑥𝛼!!𝑥 = 𝛼!!Σ𝛼! = 𝛼!!Σ𝛼! = 𝛼!!𝜆!𝛼!! = 𝜆!𝛼!!𝛼 = 𝜆!𝛼!!𝛼! = 0
𝛼!!Σ𝛼! − 𝜆! 𝛼!!𝛼! − 1 − 𝜙𝛼!!𝛼!
𝑑𝑑𝛼!
𝛼!!Σ𝛼! − 𝜆! 𝛼!!𝛼! − 1 − 𝜙𝛼!!𝛼! = 0
= Σ𝛼! − 𝜆!𝛼! − 𝜙𝛼! = 0
𝛼!!Σ𝛼! − 𝛼!!𝜆!𝛼! − 𝛼!!𝜙𝛼! = 0
0 − 0− 𝜙1 = 0
𝜙 = 0
Σ𝛼! − 𝜆!𝛼! = 0
This process can be repeated up to k = p, yielding principal components for each
of the p random variables.
AReviewandImplementationofPrincipalComponentAnalysis 7
IV. Data
For this experiment, we shall be using Ronald Fisher’s Iris flower data set, originally
collected by Edgar Anderson to study the variation of the three species. Our objective is
to determine which principal components contain the most data regarding this data set.
There are a total of 150 observations, 50 of each of the three species of flower. The
species and variables of observed are:
Species
• Iris-Setosa
• Iris-Virginica
• Iris-Veriscolor
Variables
• Sepal Length
• Sepal Width
• Petal Length
• Petal Width
AReviewandImplementationofPrincipalComponentAnalysis 8
V. Experiment When performing initial exploratory analysis on our data, we notice the following:
As we observe, the data exhibits very high variance within and between species
with respect to sepal length and sepal width, but is considerable less variable between
species, and moderately variable within species when observing petal length and petal
width. This will be a point of interest to keep in mind for later, but for now let us move
on to describing the implementation as performed here. After we load our data into a
variable within Python, we standardize our values (mean = 0, var. =1), then we calculate
the covariance matrix for X:
𝑆 = Σ!! 𝑥! − 𝑥 !(𝑥 − 𝑥)
AReviewandImplementationofPrincipalComponentAnalysis 9
Generally speaking, we want to standardize values when they are not measured on
the same scale. Although in this experiment all of the variables are measured on a
centimeter scale, it is advisable to still do so. Moving forward, we perform the Eigen-
decomposition, and obtain the eigenvalues and eigenvectors. After we sort the
eigenvectors, we observe the following:
As we can see, the first two principal components explain the vast majority of
variance within the data set. As pointed out early, the high variability amongst sepal
length and sepal width between and within species foreshadowed these events. Finally,
we project the transformed data onto the new feature space:
AReviewandImplementationofPrincipalComponentAnalysis 10
VI. Conclusion and Comments
We observe that instead of a 4-dimensional plot, as we would have originally had,
we are now looking at a very familiar xy plot. For exploratory analysis purposes, this
brings considerable ease both visually and analytically. It is easy to see that Iris-viriginica
and Iris-veriscolor show considerable similarities with respect to sepal length and sepal
width properties. In contrast, Iris-setosa in general seems to be considerably unique. As
for further applications of PCA, it is used in regression analysis often to determine which
variables should be included in a model, used in neuroscience to identify properties of
stimuli, as well as other tools. As proven above, both in theory and application, principal
component analysis provides a robust and excellent method of simplifying very complex
data into less complex forms.
AReviewandImplementationofPrincipalComponentAnalysis 11
VII. Appendix
1. Wood, F. (2009, September). Principal Component Analysis. Retrieved from http://www.stat.columbia.edu/~fwood/Teaching/w4315/Fall2009/pca.pdf
2. Abdi, H. (2007). The Eigen-Decomposition. Retrieved from https://www.utdallas.edu/~herve/Abdi-EVD2007-pretty.pdf
3. Huijuan, L. (2008, September 28). Lagrange Multipliers and their Applications. Retrieved from http://sces.phys.utk.edu/~moreo/mm08/method_HLi.pdf