variable compression: principal components analysis linear ......harvard-mit division of health...

32
Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo 6.873/HST.951 Medical Decision Support Spring 2005 Variable Compression: Principal Components Analysis Linear Discriminant Analysis Lucila Ohno-Machado

Upload: others

Post on 25-Apr-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo

6.873/HST.951 Medical Decision SupportSpring 2005

Variable Compression:Principal Components Analysis

Linear Discriminant Analysis

Lucila Ohno-Machado

Page 2: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Variable Selection

• Use few variables

• Interpretation is easier

Page 3: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Ranking Variables Univariately

• Remove one variable from the model at a time

• Compare performance of [n-1]model with full [n]model

• Rank variables according toperformancedifference

Screenshots removed due to copyright reasons.

Page 4: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Figures removed due to copyright reasons. Please see: Khan, J., et al. "Classification and diagnostic prediction of cancers using gene expression

profiling and artificial neural networks." Nat Med 7, no. 6 (Jun 2001): 673-9.

Page 5: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Variable Selection

• Ideal: consider all variable combinations

2– Not feasible in most data sets with large number of n variables:

n

• Greedy Forward:– Select most important variable as the “first component”, Select

other variables conditioned on the previous ones – Stepwise: consider backtracking

• Greedy Backward: – Start with all variables and remove one at a time. – Stepwise: consider backtracking

• Other search methods: genetic algorithms that optimizeclassification performance and # variables

Page 6: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor
Page 7: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Variable compression

• Direction of maximum variability – PCA

• PCA regression

– LDA

– (Partial Least Squares)

Page 8: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

_ coefficien ncorrelatio t

σ XY = ρr = σ σ Y X

VARIANCE COVARIANCE n n

∑ ( X − X X i − X ) ∑ ( X − Y Y X ) σ =

i )( i= 1

XX n − 1 st _ deviation

i )( i − σ XY = i= 1

n− 1

1

))(( 1

−− = ∑ =

n

XX i

n

i i

Xσ X X

Page 9: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Y Covariance and Correlation Matrices

⎡σ XX σ XY ⎤ cov = ⎢ ⎣σYX σYY ⎦

⎡ 1 ρ⎤corr = ⎢⎣ρ 1 ⎥⎦ n

)( −∑( X − Y Y X )σ =

i i i=1

XY n − 1 n

i )(∑( X − X X i − X ) σ = i=1

XX n − 1

0 X

Page 10: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Slope from linear regression is asymmetric, covariance and ρ are symmetric

β 0 = y − β x y = β + β 1 x1 0

−β 1 =Σ ( x − x)( y y ) y = 2 + 4 x

Σ ( x − x)2 x = y / 4 − 2

y ⎡ .0 86 .0 35 ⎤ cov = ⎢ ⎥ Σ = ⎣ .0 35 15.69⎦⎡ 1 .0 96 ⎤corr = ⎢⎣ .0 96 1 ⎦

x

Page 11: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Principal Components Analysis

• Our motivation: Reduce the number of variables so that we can run interesting algorithms

• The goal is to build linear combinations of the variables (transformation vectors)

• First component should represent the direction with largest variance

• Second component is orthogonal to (independent of) the first, and is the next onewith largest variance

• and so on…

Page 12: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Y

X and Y are not independent (covariance is not 0)

Y =( X *4)+e

σ

σ⎢⎣

= XX

XY

XY

σ

σ

YY

≠0⎡ ⎤cov ⎥⎦≠ 0

.0 86 .0 35⎡cov ⎢⎣

= ⎥⎦.0 35 15.69

=ρ .0 96

0 X1

Page 13: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Eigenvalues

I is the identity matrix.A is a square matrix (such as the covariance matrix).

|A - λ I| = 0

λ is called the eigenvalue (or characteristic root) of A.

⎡σ XX σ XY ⎤ ⎡ 1 0⎤ = 0⎢ ⎢

⎣σ XY σ YY ⎥⎦ −λ

⎣ 0 1⎦⎥

Page 14: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

EigenvectorsXX XY

XY YY

⎤ ⎥⎦

ab ⎡ ⎢⎣

q= ⎤ ⎥⎦

ab ⎡ ⎢⎣

⎤ ⎥⎦

σσ

σσ⎡ ⎢⎣

XX XY ⎤ ⎥⎦XY YY

cd ⎡ ⎢⎣

m=⎤ ⎥⎦

cd ⎡ ⎢⎣

⎤ ⎥⎦

σσ

σσ⎡ ⎢⎣

q and m are eigenvalues

[a b]T and [c d]T are eigenvectors, they are orthogonal (independent of each other, do not contain redundant information)

The eigenvector associated with the largest eigenvalue will point in the direction of largest variance in the data.

If q > m, then [a b]T is PC1

Page 15: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Principal Components.1 .5 12 21

⎡ ⎢⎣

27 .5 12 .0 23⎤ ⎥⎦

.0 23⎤ ⎥⎦

⎡ ⎤ ⎡22.87=65 .0 97⎥⎦⎢⎣

⎢⎣ .0 97.

X10

X2.1 .5 12 21

⎡ ⎢⎣

27 .5 12 .0 97 . 0 97⎡⎤ ⎤ ⎡ ⎤ ⎥⎦23

. 0 0523

= ⎥⎦ 65 − .0⎥

⎦⎢⎣

⎢⎣− .0 .

Total variance is 21.65 + 1.27 = 22.92

Page 16: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

c

x

Transformed dataO1 O2 O3

⎡ ⎤ ⎡ ⎤x x31x11 a c21 , PC= =⎢⎣

⎥⎦

⎢⎣

⎥⎦d b x12 x22 x32

bx12 bx bx32b ⎡ ⎤⎡ ⎤ + + +ax11 ax ax31a 21 22x ⎢⎣

= ⎥⎦

⎢⎣

⎥⎦ dx12 dx dx32d + + +cx11 cx cx21 22 31

23.0 97.0 12 23.0 97.0 23.0 97.0 23.0 97.0 ⎡⎡ ⎤ + +x11 x x x x31 x3221 22x23.0 ⎥⎦

⎢⎣

=⎢⎣ 97.0 23.0 12 97.0 23.0 97.0 23 .0 97.0 − − −x11 x x x x3121 22

Total variance is 22.92

Variance of PC1 is 22.87, so it captures 99% of the variance.

PC2 can be discarded with little loss of information.

PC1

PC2

X2

0 X1

⎤+ ⎥⎦− 32 x

Page 17: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

PC1 is not at the regression linePC

Reg • y=4x Y

• [a b]T = [0.23 0.97] • Transformation is

0.23x+0.97y • PC1 goes thru

(0,0) and (0.23,0.97) • Its slope is

–0.97/0.23 = 4.217

0 X

Page 18: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

PCA regression

• Reduce original dimensionality n (number ofvariables) finding n PCs, suchthat n<d

• Perform regression on PCs• Problem: Direction of greater

overall variance is not necessarily best forclassification

• Solution: Consider also direction of greater separationbetween two classes

Page 19: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

(not so good) idea: Class mean separation

• Find means of each category

• Draw the line that passes through the 2 means

• Project points on the line(a.k.a. orthogonalize points with

respect to the line) • Find point that best

separates data

Page 20: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Fisher’s Linear Discriminant

• Use classes to define discrimination line, but criterion to maximize is: – ratio of (between classes variation) and

(within classes variation) • Project all objects into the line

• Find point in the line that best separates classes

Page 21: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Sw is the sum of scatters

⎡ ⎢⎣

=

⎡⎢⎣

within the classes 11

1 ))((

n xx T

ii

−− =

μμ1 4⎤cov1 ⎥⎦4 16 scatter

22

1 ))(( xx T

ii

−− μμ1.1 9.3 ⎡ ⎤cov2 == ⎢⎣

⎥⎦9.3 14 n

k

S ( −1)= p j cov j nw j 1 =

1 4⎡ ⎤S1 5. (99)= ⎢⎣

⎥⎦4 16

1.1 9.3 ⎤S 5. (99)= ⎥⎦9.3 14

Sw =S1+S2

2

Page 22: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Sb is scatter between the classes

TSb = (μ − μ )(μ − μ )1 2 1 2

• Maximize Sb/Sw (a square d x d matrix,where d is the number of dimensions)

• Find maximum eigenvalue and respective eigenvector

• This is the direction that maximizes Fisher’s criterion

• Points are projected over this line • Calculate distance from every projected point

to projected class means and decide class corresponding to smaller distance

• Assumption: class distributions are normal

Eigenvector with max eigenvalue

Page 23: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Classification Models

• Quadratic Discriminant Analysis• Partial Least Squares

– PCA uses X to calculate directions of greater variation

– PLS uses X and Y to calculate these directions• It is a variation of multiple linear regression

PCA maximizes Var(Xα), PLS maximizes Corr2(y,Xα)Var(Xα)

• Logistic Regression

Page 24: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

PCA, PLS, Selection

• 3 data sets– Singh et al. (Cancer Cell, 2002: 52 cases of benign and

malignant prostate tumors)– Bhattachajee et al. (PNAS, 2001: 186 cases of different types of

lung cancer) – Golub et al. (Science, 1999: 72 cases of acute myeloblastic and

lymphoblastic leukemia)

• PCA logistic regression • PLS • Forward selection logistic regression

• 5-fold cross-validation

Page 25: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Screenshots removed due to copyright reasons.

Missing Values: 0.001% -0.02%

Page 26: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor
Page 27: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor
Page 28: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor
Page 29: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor
Page 30: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor
Page 31: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor
Page 32: Variable Compression: Principal Components Analysis Linear ......Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor

Classification of Leukemia with Gene Expression

PCA

Variable Selection

Variable selection from ~2,300 genes