principal component analysis adapted by paul anderson from tutorial by doug raiford
TRANSCRIPT
Principal Component Analysis
Adapted by Paul Anderson from Tutorial by Doug Raiford
The Problem with Apples and Oranges
High dimensionality Can’t “see” If had only one, two, or
three features, could represent graphically
But 4 or more…
Weight Diameter Redness Orangeness
Ex1 0.26 3.10 2.92 7.78
Ex2 0.35 2.51 1.91 5.34
Ex3 0.30 2.33 2.05 11.49
Ex4 0.21 3.67 10.82 1.79
Ex5 0.28 2.13 3.11 9.02
Ex6 0.28 3.83 8.80 2.04
Ex7 0.10 3.96 7.30 2.81
Ex8 0.32 3.40 1.16 12.01
Ex9 0.19 3.89 2.75 9.45
Ex10 0.22 2.46 1.71 10.98
Ex11 0.33 3.95 7.88 2.67
Ex12 0.43 2.99 1.03 10.16
Ex13 0.21 5.29 11.44 1.44
Ex14 0.30 3.35 9.99 1.51
Ex15 0.26 3.04 4.48 1.46
Ex16 0.27 4.38 6.48 1.55
Ex17 0.46 2.90 1.86 7.79
Ex18 0.29 2.92 11.88 1.66
Ex19 0.24 3.50 9.09 1.75
Ex20 0.40 3.24 2.00 11.16
If Could Compress Into 2 Dimensions
Apples and oranges: feature vectors
Axis of greatest variance
How?
In MatLab– evects = princomp(allFruit);
b1=evects(:,1);b2=evects(:,2);Z1=allFruit*b1;Z2=allFruit*b2;scatter(Z1,Z2);
Real World Example
59 dimensions 3500 genes Very useful in
exploratory data analysis
Sometimes useful as a direct tool (MCU)
But We’re Not Scared of the Details
Given– Data matrix M (feature vectors for all examples)
Generate – covariance matrix for M (Σ)– Eigenvectors (principal components) from
covariance matrix
M Σ Eigenvectors
Eigenvectors and Eigenvalues
Each Eigenvector is accompanied with an Eigenvalue
The Eigenvector with the greatest Eigenvalue points along the axis of greatest variance
Eigenvectors and Eigenvalues
If use only first principal component very little degradation of data
Have reduced dimensions from 2 to 1
Project data onto new axes
Once have Eigenvectors can project data onto new axis
Eigenvectors are unit vectors, so simple dot product produces the desired effect
M Σ Eigenvectors Project Data
Covariance Matrix
M Σ Eigenvectors Project DataWeight Diameter Redness Orangeness
Ex1 0.26 3.10 2.92 7.78Ex2 0.35 2.51 1.91 5.34Ex3 0.30 2.33 2.05 11.49Ex4 0.21 3.67 10.82 1.79Ex5 0.28 2.13 3.11 9.02Ex6 0.28 3.83 8.80 2.04Ex7 0.10 3.96 7.30 2.81Ex8 0.32 3.40 1.16 12.01Ex9 0.19 3.89 2.75 9.45Ex10 0.22 2.46 1.71 10.98Ex11 0.33 3.95 7.88 2.67Ex12 0.43 2.99 1.03 10.16Ex13 0.21 5.29 11.44 1.44Ex14 0.30 3.35 9.99 1.51Ex15 0.26 3.04 4.48 1.46Ex16 0.27 4.38 6.48 1.55Ex17 0.46 2.90 1.86 7.79Ex18 0.29 2.92 11.88 1.66Ex19 0.24 3.50 9.09 1.75Ex20 0.40 3.24 2.00 11.16
1,1 1,2 1,3 1,4
2,1 2,2 2,3 2,4
3,1 3,2 3,3 3,4
4,1 4,2 4,3 4,4
Covariance Matrix
8.3949 7.5958
7.5958 7.7130
Covariance Matrix
8.7951 0.3299
0.3299 0.9200
Eigenvector
Eigenvector– Linear transform of the Eigenvector using Σ as the
transformation matrix resulting in a parallel vector
M Σ Eigenvectors Project Data
vΣv
Eigenvector
How to find– Σ is an nxn matrix– There will be n
Eigenvectors– Eigenvectors ≠ 0– Eigenvalues ≠ 0
0)(
0
0
vΣ
vΣv
vΣv
vΣv
I
I
Eigenvector
A is invertible if and only if det(A) 0 If (A-v) is invertible then:
But it is given that v 0 so must not be invertible
Not invertible so det(A-v) = 0
0v
0Σv
0vΣ
1)(
)(
I
I
Eigenvector
First, solve for the by performing the following operations:
If solve for will get 2 roots, 1 and 2.
0)()()P(
0)det(
0
0
2
2
bcadda
bcdaadI
dc
ba
dc
baI
Σ
Σ
Eigenvector
Now that the Eigenvalues have been acquired we can solve for the Eigenvector (v below).
Know Σ, know , know I, so becomes homogeneous system of equations (equal to 0) with the entries of v as the variables
Already know that there is no unique solution – The only way there is a unique solution is if the trivial solution is only
solution. – If this were the case it would be invertible
0vΣ )( I
Back to the example
69.
79.
2625.75958.7
2625.7
5958.7
09443.75958.7
05958.72625.7
15.66,0.45,
0607*607717*398717398
0)det(
)cov(
22
1
1
1_1
1
1
21
2
2
rEivenvecto
rEigenvecto
rEivenvectorEigenvecto
rEigenvecto
yx
yx
I
roots
)=..-..().+.(
(ad-bc)=(a+d)dc
baI
data
norm
7.94-7.60
7.607.26-Σ
Σ
7.717.60
7.608.39Σ
Back to the example
0.72
0.69
7.9443
7.5958
7.267.60
7.607.94
normrEigenvecto
rEigenvecto
y
xy
y.x.
y.x.
I(A
2
2
5958.7
9443.7
9443.75958.7
02625759587
05958794437
)2
Eigenvectors (Summary)
Find characteristic polynomial using determinant
Solve for Eigenvalues (λ’s) Solve for Eigenvectors
M Σ Eigenvectors Project Data
P(λ) λ’s Eigenvectors
Axis of Greatest Variance?
Equation for an ellipse D, E, and F have to do
with translation A and C related to the
ellipse’s spread along the X and Y axes, respectively
B has to do with rotation
022 F Ey Dx Cy Bxy Ax
022 CyBxyAx
Axis of Greatest Variance
Mathematicians discovered that any ellipse can be exactly captured by a symmetric matrix
Covariance matrix is symmetric
The Eigenvectors of the said matrix point along the principal axes of the ellipse
Origin of the name (principal components analysis)
A B/2
B/2 CRelated to spread along x axis (variance of data along x axis)
Related to spread along y axis
Related to rotation (covariance)
Principal Axis Theorem
Principal axis theorem holds for quadratic forms (conic sections) in higher dimensional spaces
Project Data Onto Principal Components
Eigenvectors are unit vectors
vv
vuu 2
vproj
M Σ Eigenvectors Project Data
2
1
Mv
Mv
Review
In MatLab
evects = princomp(allFruit);b1=evects(:,1);b2=evects(:,2);Z1=allFruit*b1;Z2=allFruit*b2;scatter(Z1,Z2);
Practice
Covariance matrix 4.3703 2.0668
2.0668 4.0295
4 2
2 4
4.3703 2.0668
2.0668 4.0295
Practice
0224444
0
0)det(
2
2
=*-*λ + + - λ
=ad-bcλ + a+d-λ
I
2
6
2
1
λ
λ
M Σ Eigenvectors Project Data
P(λ) λ’s Eigenvectors
4 2
2 4
Practice
2-2
22-1IΣ
0.7071
0.7071
1
1
22
022
1
1_1
1
rEigenvecto
rEivenvectorEigenvecto
rEigenvecto
xy
xy
yx
norm
4 2
2 4
M Σ Eigenvectors Project Data
P(λ) λ’s Eigenvectors
2
6
2
1
λ
λ
Practice
M Σ Eigenvectors Project Data
P(λ) λ’s Eigenvectors
0.7071
0.7071-
1
1-
1
22
022
22
22
2
2_2
2
2
rEigenvecto
rEivenvectorEigenvecto
rEigenvecto
y
xy
yx
I
norm
Σ
4 2
2 4
Questions?
Why Invertible if row reducible to I?
zy
xw
dc
ba
dc
ba
dc
ba
dzcxdycw
bzaxbyaw
zy
xw
dc
ba
IAA
10
01
10
01
1
0,
0
1
10
01
10
01
1
Implication of Zero Determinants
Why Eigenvector Associated with Greatest λ Points Along Axis of Greatest Variance
)(
),(
)1)((
)1)(,(
ˆ1
)(
1),(
)(
)(
))((
*
tcoefficienn correlatioPearson
1
10
1
2
1
2
1
XVar
YXCov
nXVar
nYXCov
SSX
SCPb
xbbyn
SSXXVar
n
SCPYXCov
yySSY
xxSSX
yyxxSCP
SSYSSX
SCPR
n
ii
n
ii
n
iii
xy
xy
xca
dby
caxdby
xcaxyydb
yxydbxca
yxydbxca
y
x
dycx
byax
y
x
y
x
dc
ba
x
y
x
y
cov)(var
)(varcov
covvar
varcov
)()(
)()(
)()(
00
)()(
Rotation
Good search terms: rotation of axes conic sections Note that in the sections above dealing with the ellipse, hyperbola, and the parabola, the algebraic
equations that appeared did not contain a term of the form xy. However, in our “Algebraic View of the Conic Sections,” we stated that every conic section is of the form
Ax2 + Bxy + Cy2 + Dx + Ey + F = 0
where A, B, C, D, E, and F are constants. In essence, all of the equations that we have studied have had B=0. So the question arises: “What role, if any, does the xy term play in conic sections? If it were present, how would that change the geometric figure?”
First of all, the answer is NOT that the conic changes from one type to another. That is to say, if we introduce an xy term, the conic does NOT change from an ellipse to a hyperbola. If we start with the standard equation of an ellipse and insert an extra term, an xy term, we still have an ellipse.
So what does the xy term do? The xy term actually rotates the graph in the plane. For example, in the case of an ellipse, the major axis is no longer parallel to the x-axis or y-axis. Rather, depending on the constant in front of the xy term, we now have the major axis rotated. Let’s look at an example.
* Example
rotation
A is related to elongation in x direction C is related to elongation in y direction B is related to rotation (B is not equal to zero
if and only if there is rotation) D, E, and F related to h and k (x and y shift,
(x-h), (y-k)) D, E, and F not affected by rotation A and C are affected
Standard equation
Standard equation of the ellipse is: a = 5 and b = 2 Hence: The length of major axes is: 2a = 10. The length of minor axes is: 2b = 4.
1)()(
2
2
2
2
b
ky
a
hx
New rotated coordinate system
Coordinate Rotation Formulas– If a rectangular xy-coordinate system is rotated
through an angle θ to form an x’y’ coordinate system, then a point P(x; y) will have coordinates P(x’; y’) in the new system, where (x; y) and (x’; y’) are related by
x = x’ cos θ − y' sin θ and y = x' sin θ + y' cos θ : and x' = x cos θ + y sin θ and y' = −x sin θ + y cos θ :
rotation
The values of h and k give horizontal and vertical (resp.) translation distances, and t gives rotation angle (measured in degrees). Notice how changes in these transformation values affect the coefficients, and how changes in the coefficients affect the transformations.
The lines shown in green in the graph are the following key lines for the conic sections: the major and minor axes for ellipses (crossing at the center of the ellipse), the axis of symmetry and perpendicular line through the vertex for a parabola (crossing at the vertex), and the two perpendicular axes of symmetry (crossing through the center point) for a hyperbola. In all cases, the two lines cross at the point (h,k), and are rotated from the position parallel to the coordinate axes by t degrees. In graphs of hyperbolas, the asymptotes of the hyperbola are shown as orange lines.
If B2-4AC<0, then the graph is an ellipse (if B=0 and A=C in this case, then the graph is a circle)
One other important formula determines the relationship between the coefficients and the angle of rotation: tan(2t)=B/(A-C). Note that rotation has no effect on the values of the coefficients D, E, and F, and that t=0 (no rotation) if and only if B=0. The values of the coordinates of the point (h,k) are best determined from the coefficients by first reversing the effect of the rotation (so that B=0), then completing the squares.
Principal Axis Theorem
Principal axis theorem holds for quadratic forms (conic sections) in higher dimensional spaces
)var()var(
),cov(
)var()var(
),cov(22
)var()var(
),cov(2
)2cos(
)2sin(
2tan
yx
yxslope
yx
yxslope
yx
yx
Θ
Θ
(A-C)
BΘ)(