principal component analysis adapted by paul anderson from tutorial by doug raiford

Principal Component Analysis

Adapted by Paul Anderson from Tutorial by Doug Raiford

The Problem with Apples and Oranges

High dimensionality Can’t “see” If had only one, two, or

three features, could represent graphically

But 4 or more…

Weight Diameter Redness Orangeness

Ex1 0.26 3.10 2.92 7.78

Ex2 0.35 2.51 1.91 5.34

Ex3 0.30 2.33 2.05 11.49

Ex4 0.21 3.67 10.82 1.79

Ex5 0.28 2.13 3.11 9.02

Ex6 0.28 3.83 8.80 2.04

Ex7 0.10 3.96 7.30 2.81

Ex8 0.32 3.40 1.16 12.01

Ex9 0.19 3.89 2.75 9.45

Ex10 0.22 2.46 1.71 10.98

Ex11 0.33 3.95 7.88 2.67

Ex12 0.43 2.99 1.03 10.16

Ex13 0.21 5.29 11.44 1.44

Ex14 0.30 3.35 9.99 1.51

Ex15 0.26 3.04 4.48 1.46

Ex16 0.27 4.38 6.48 1.55

Ex17 0.46 2.90 1.86 7.79

Ex18 0.29 2.92 11.88 1.66

Ex19 0.24 3.50 9.09 1.75

Ex20 0.40 3.24 2.00 11.16

If Could Compress Into 2 Dimensions

Apples and oranges: feature vectors

Axis of greatest variance

How?

In MatLab– evects = princomp(allFruit);

b1=evects(:,1);b2=evects(:,2);Z1=allFruit*b1;Z2=allFruit*b2;scatter(Z1,Z2);

Real World Example

59 dimensions 3500 genes Very useful in

exploratory data analysis

Sometimes useful as a direct tool (MCU)

But We’re Not Scared of the Details

Given– Data matrix M (feature vectors for all examples)

Generate – covariance matrix for M (Σ)– Eigenvectors (principal components) from

covariance matrix

M Σ Eigenvectors

Eigenvectors and Eigenvalues

Each Eigenvector is accompanied with an Eigenvalue

The Eigenvector with the greatest Eigenvalue points along the axis of greatest variance

Eigenvectors and Eigenvalues

If use only first principal component very little degradation of data

Have reduced dimensions from 2 to 1

Project data onto new axes

Once have Eigenvectors can project data onto new axis

Eigenvectors are unit vectors, so simple dot product produces the desired effect

M Σ Eigenvectors Project Data

Covariance Matrix

M Σ Eigenvectors Project DataWeight Diameter Redness Orangeness

Ex1 0.26 3.10 2.92 7.78Ex2 0.35 2.51 1.91 5.34Ex3 0.30 2.33 2.05 11.49Ex4 0.21 3.67 10.82 1.79Ex5 0.28 2.13 3.11 9.02Ex6 0.28 3.83 8.80 2.04Ex7 0.10 3.96 7.30 2.81Ex8 0.32 3.40 1.16 12.01Ex9 0.19 3.89 2.75 9.45Ex10 0.22 2.46 1.71 10.98Ex11 0.33 3.95 7.88 2.67Ex12 0.43 2.99 1.03 10.16Ex13 0.21 5.29 11.44 1.44Ex14 0.30 3.35 9.99 1.51Ex15 0.26 3.04 4.48 1.46Ex16 0.27 4.38 6.48 1.55Ex17 0.46 2.90 1.86 7.79Ex18 0.29 2.92 11.88 1.66Ex19 0.24 3.50 9.09 1.75Ex20 0.40 3.24 2.00 11.16

1,1 1,2 1,3 1,4

2,1 2,2 2,3 2,4

3,1 3,2 3,3 3,4

4,1 4,2 4,3 4,4

Covariance Matrix

8.3949 7.5958

7.5958 7.7130

Covariance Matrix

8.7951 0.3299

0.3299 0.9200

Eigenvector

Eigenvector– Linear transform of the Eigenvector using Σ as the

transformation matrix resulting in a parallel vector


vΣv

Eigenvector

How to find– Σ is an nxn matrix– There will be n

Eigenvectors– Eigenvectors ≠ 0– Eigenvalues ≠ 0

0)(

0

0

vΣ

vΣv

vΣv

vΣv

I

I

Eigenvector

A is invertible if and only if det(A) 0 If (A-v) is invertible then:

But it is given that v 0 so must not be invertible

Not invertible so det(A-v) = 0

0v

0Σv

0vΣ

1)(

)(

I

I

Eigenvector

First, solve for the by performing the following operations:

If solve for will get 2 roots, 1 and 2.

0)()()P(

0)det(

0

0

2

2

bcadda

bcdaadI

dc

ba

dc

baI

Σ

Σ

Eigenvector

Now that the Eigenvalues have been acquired we can solve for the Eigenvector (v below).

Know Σ, know , know I, so becomes homogeneous system of equations (equal to 0) with the entries of v as the variables

Already know that there is no unique solution – The only way there is a unique solution is if the trivial solution is only

solution. – If this were the case it would be invertible

0vΣ )( I

Back to the example

69.

79.

2625.75958.7

2625.7

5958.7

09443.75958.7

05958.72625.7

15.66,0.45,

0607*607717*398717398

0)det(

)cov(

22

1

1

1_1

1

1

21

2

2

rEivenvecto

rEigenvecto

rEivenvectorEigenvecto

rEigenvecto

yx

yx

I

roots

)=..-..().+.(

(ad-bc)=(a+d)dc

baI

data

norm

7.94-7.60

7.607.26-Σ

Σ

7.717.60

7.608.39Σ

Back to the example

0.72

0.69

7.9443

7.5958

7.267.60

7.607.94

normrEigenvecto

rEigenvecto

y

xy

y.x.

y.x.

I(A

2

2

5958.7

9443.7

9443.75958.7

02625759587

05958794437

)2

Eigenvectors (Summary)

Find characteristic polynomial using determinant

Solve for Eigenvalues (λ’s) Solve for Eigenvectors


P(λ) λ’s Eigenvectors

Axis of Greatest Variance?

Equation for an ellipse D, E, and F have to do

with translation A and C related to the

ellipse’s spread along the X and Y axes, respectively

B has to do with rotation

022 F Ey Dx Cy Bxy Ax

022 CyBxyAx

Axis of Greatest Variance

Mathematicians discovered that any ellipse can be exactly captured by a symmetric matrix

Covariance matrix is symmetric

The Eigenvectors of the said matrix point along the principal axes of the ellipse

Origin of the name (principal components analysis)

A B/2

B/2 CRelated to spread along x axis (variance of data along x axis)

Related to spread along y axis

Related to rotation (covariance)

Principal Axis Theorem

Principal axis theorem holds for quadratic forms (conic sections) in higher dimensional spaces

Project Data Onto Principal Components

Eigenvectors are unit vectors

vv

vuu 2

vproj


2

1

Mv

Mv

Review

In MatLab

evects = princomp(allFruit);b1=evects(:,1);b2=evects(:,2);Z1=allFruit*b1;Z2=allFruit*b2;scatter(Z1,Z2);

Practice

Covariance matrix 4.3703 2.0668

2.0668 4.0295

4 2

2 4

4.3703 2.0668

2.0668 4.0295

Practice

0224444

0

0)det(

2

2

=*-*λ + + - λ

=ad-bcλ + a+d-λ

I

2

6

2

1

λ

λ



4 2

2 4

Practice

2-2

22-1IΣ

0.7071

0.7071

1

1

22

022

1

1_1

1

rEigenvecto


rEigenvecto

xy

xy

yx

norm

4 2

2 4



2

6

2

1

λ

λ

Practice



0.7071

0.7071-

1

1-

1

22

022

22

22

2

2_2

2

2

rEigenvecto


rEigenvecto

y

xy

yx

I

norm

Σ

4 2

2 4

Questions?

Why Invertible if row reducible to I?

zy

xw

dc

ba

dc

ba

dc

ba

dzcxdycw

bzaxbyaw

zy

xw

dc

ba

IAA

10

01

10

01

1

0,

0

1

10

01

10

01

1

Implication of Zero Determinants

Why Eigenvector Associated with Greatest λ Points Along Axis of Greatest Variance

)(

),(

)1)((

)1)(,(

ˆ1

)(

1),(

)(

)(

))((

*

tcoefficienn correlatioPearson

1

10

1

2

1

2

1

XVar

YXCov

nXVar

nYXCov

SSX

SCPb

xbbyn

SSXXVar

n

SCPYXCov

yySSY

xxSSX

yyxxSCP

SSYSSX

SCPR

n

ii

n

ii

n

iii

xy

xy

xca

dby

caxdby

xcaxyydb

yxydbxca

yxydbxca

y

x

dycx

byax

y

x

y

x

dc

ba

x

y

x

y

cov)(var

)(varcov

covvar

varcov

)()(

)()(

)()(

00

)()(

Rotation

Good search terms: rotation of axes conic sections Note that in the sections above dealing with the ellipse, hyperbola, and the parabola, the algebraic

equations that appeared did not contain a term of the form xy. However, in our “Algebraic View of the Conic Sections,” we stated that every conic section is of the form

Ax2 + Bxy + Cy2 + Dx + Ey + F = 0

where A, B, C, D, E, and F are constants. In essence, all of the equations that we have studied have had B=0. So the question arises: “What role, if any, does the xy term play in conic sections? If it were present, how would that change the geometric figure?”

First of all, the answer is NOT that the conic changes from one type to another. That is to say, if we introduce an xy term, the conic does NOT change from an ellipse to a hyperbola. If we start with the standard equation of an ellipse and insert an extra term, an xy term, we still have an ellipse.

So what does the xy term do? The xy term actually rotates the graph in the plane. For example, in the case of an ellipse, the major axis is no longer parallel to the x-axis or y-axis. Rather, depending on the constant in front of the xy term, we now have the major axis rotated. Let’s look at an example.

* Example

rotation

A is related to elongation in x direction C is related to elongation in y direction B is related to rotation (B is not equal to zero

if and only if there is rotation) D, E, and F related to h and k (x and y shift,

(x-h), (y-k)) D, E, and F not affected by rotation A and C are affected

Standard equation

Standard equation of the ellipse is: a = 5 and b = 2 Hence: The length of major axes is: 2a = 10. The length of minor axes is: 2b = 4.

1)()(

2

2

2

2

b

ky

a

hx

New rotated coordinate system

Coordinate Rotation Formulas– If a rectangular xy-coordinate system is rotated

through an angle θ to form an x’y’ coordinate system, then a point P(x; y) will have coordinates P(x’; y’) in the new system, where (x; y) and (x’; y’) are related by

x = x’ cos θ − y' sin θ and y = x' sin θ + y' cos θ : and x' = x cos θ + y sin θ and y' = −x sin θ + y cos θ :

rotation

The values of h and k give horizontal and vertical (resp.) translation distances, and t gives rotation angle (measured in degrees). Notice how changes in these transformation values affect the coefficients, and how changes in the coefficients affect the transformations.

The lines shown in green in the graph are the following key lines for the conic sections: the major and minor axes for ellipses (crossing at the center of the ellipse), the axis of symmetry and perpendicular line through the vertex for a parabola (crossing at the vertex), and the two perpendicular axes of symmetry (crossing through the center point) for a hyperbola. In all cases, the two lines cross at the point (h,k), and are rotated from the position parallel to the coordinate axes by t degrees. In graphs of hyperbolas, the asymptotes of the hyperbola are shown as orange lines.

If B2-4AC<0, then the graph is an ellipse (if B=0 and A=C in this case, then the graph is a circle)

One other important formula determines the relationship between the coefficients and the angle of rotation: tan(2t)=B/(A-C). Note that rotation has no effect on the values of the coefficients D, E, and F, and that t=0 (no rotation) if and only if B=0. The values of the coordinates of the point (h,k) are best determined from the coefficients by first reversing the effect of the rotation (so that B=0), then completing the squares.

Principal Axis Theorem

Principal axis theorem holds for quadratic forms (conic sections) in higher dimensional spaces

)var()var(

),cov(

)var()var(

),cov(22

)var()var(

),cov(2

)2cos(

)2sin(

2tan

yx

yxslope

yx

yxslope

yx

yx

Θ

Θ

(A-C)

BΘ)(

principal component analysis adapted by paul anderson from tutorial by doug raiford

Documents

invertible slide

z2 slide

eigenvectorsproject

data matrix

n eigenvectors eigenvectors

doug raiford slide

eigenvector v

covariance matrix