chapter 28 – part ii matrix operations. gaussian elimination gaussian elimination lu factorization...
TRANSCRIPT
Chapter 28 – Part II
Matrix Operations
Matrix Operations Gaussian elimination LU factorization Gaussian elimination with partial
pivoting LUP factorization Error analysis Complexity of matrix multiplication
& inversion SVD and PCA SVD and PCA applications
Project 2 – image compression using SVD
Eigenvalues & Eigenvectors
Eigenvectors (for a square mm matrix S)
How many eigenvalues are there at most?
only has a non-zero solution if
this is a m-th order equation in λ which can have at most m distinct solutions (roots of the characteristic polynomial) – can be complex even though S is real.
eigenvalue(right) eigenvector
Example
Some of these slides are adapted from notes of Stanford CS276 & CMU CS385
Matrix-vector multiplication
000
020
003
S has eigenvalues 3, 2, 0 withcorresponding eigenvectors
0
0
1
1v
0
1
0
2v
1
0
0
3v
On each eigenvector, S acts as a multiple of the identity matrix: but as a different multiple on each.
Any vector (say x= ) can be viewed as a combination of the eigenvectors:
x = 2v1 + 4v2 + 6v3
6
4
2
Matrix vector multiplication
Thus a matrix-vector multiplication such as Sx can be rewritten in terms of the eigenvalues/vectors:
Even though x is an arbitrary vector, the action of S on x is determined by the eigenvalues/vectors.
Suggestion: the effect of “small” eigenvalues is small.
332211321
321
642642
)642(
vvvSvSvSvSx
vvvSSx
Eigenvalues & Eigenvectors
0 and , 2121}2,1{}2,1{}2,1{ vvvSv
- For symmetric matrices, eigenvectors for distincteigenvalues are orthogonal
TSS and 0 if ,complex for IS
- All eigenvalues of a real symmetric matrix are real.
0vSv if then ,0, Swww Tn
- All eigenvalues of a positive semidefinite matrixare non-negative
?
?
?
Example
Let
Then
The eigenvalues are 1 and 3 (nonnegative, real).
The eigenvectors are orthogonal (and real):
21
12S
.01)2(21
12 2
IS
1
1
1
1
Real, symmetric.
Let be a square matrix with m linearly independent eigenvectors
Theorem: Exists an eigen decomposition
Columns of U are eigenvectors of S Diagonal elements of are
eigenvalues of
Eigen/diagonal Decomposition
Diagonal decomposition (2)
nvvU ...1Let U have the eigenvectors as columns:
n
nnnn vvvvvvSSU
............
1
1111
Then, SU can be written
And S=UU–1.
Thus SU=U, or U–1SU=
Diagonal decomposition - example
Recall .3,1;
21
1221
S
The eigenvectors:
1
1
1
1
11
11U
Inverting U,
2/12/1
2/12/11U
Then, S=UU–1 =
2/12/1
2/12/1
30
01
11
11
Example continued
Let’s divide U (and multiply U–1 ) by 2
2/12/1
2/12/1
30
01
2/12/1
2/12/1Then, S=
Q (Q-1= QT )
If is a symmetric matrix: Theorem: Exists a (unique) eigen
decomposition where Q is orthogonal:
Q-1= QT
Columns of Q are normalized eigenvectors
Columns are orthogonal. (everything is real)
Symmetric Eigen Decomposition
TQQS
Exercise
Examine the symmetric eigen decomposition, if any, for each of the following matrices:
01
10
01
10
32
21
42
22
Singular Value Decomposition
TVUA
mm mn V is nn
For an m n matrix A of rank r there exists a factorization(Singular Value Decomposition = SVD) as follows:
The columns of U are orthogonal eigenvectors of AAT.
The columns of V are orthogonal eigenvectors of ATA.
ii
rdiag ...1 Singular values.
Eigenvalues 1 … r of AAT are the eigenvalues of ATA.
Singular Value Decomposition
Illustration of SVD dimensions and sparseness
SVD example
Let
01
10
11
A
Thus m=3, n=2. Its SVD is
2/12/1
2/12/1
00
30
01
3/16/12/1
3/16/12/1
3/16/20
Typically, the singular values arranged in decreasing order.…
SVD can be used to compute optimal low-rank approximations.
Approximation problem: Find Ak of rank k such that
Ak and X are both mn matrices.
Typically, want k << r.
Low-rank Approximation
Frobenius norm = 2-normF
kXrankXk XAA
min
)(:
Solution via SVD
Low-rank Approximation
set smallest r-k singular values to zero
Tkk VUA )0,...,0,,...,(diag 1
column notation: sum of rank 1 matrices
Tii
k
i ik vuA
1
k
Approximation error
How good (bad) is this approximation?
It’s the best possible, measured by the Frobenius norm of the error:
where the i are ordered such that i i+1.
Suggests why Frobenius error drops as k increased.
) :shown becan (It
,min
12
)(:
kk
FkFkXrankX
AA
AAXA
SVD Application
Image compression
SVD example
Eigen values of A’A: ?
Eigen vector?
Matrix U?
SV
D e
xam
ple
--
Acti
on
of
A o
n a
un
it
cir
cle
apply
Vx U
PCA
Principal Components Analysis
Data Presentation
Example: 53 Blood and urine measurements (wet chemistry) from 65 people (33 alcoholics, 32 non-alcoholics).
Matrix Format
Spectral Format
H - W B C H - R B C H - H g b H - H c t H - M C V H - M C H H - M C H CH - M C H C
A 1 8 . 0 0 0 0 4 . 8 2 0 0 1 4 . 1 0 0 0 4 1 . 0 0 0 0 8 5 . 0 0 0 0 2 9 . 0 0 0 0 3 4 . 0 0 0 0
A 2 7 . 3 0 0 0 5 . 0 2 0 0 1 4 . 7 0 0 0 4 3 . 0 0 0 0 8 6 . 0 0 0 0 2 9 . 0 0 0 0 3 4 . 0 0 0 0
A 3 4 . 3 0 0 0 4 . 4 8 0 0 1 4 . 1 0 0 0 4 1 . 0 0 0 0 9 1 . 0 0 0 0 3 2 . 0 0 0 0 3 5 . 0 0 0 0
A 4 7 . 5 0 0 0 4 . 4 7 0 0 1 4 . 9 0 0 0 4 5 . 0 0 0 0 1 0 1 . 0 0 0 0 3 3 . 0 0 0 0 3 3 . 0 0 0 0
A 5 7 . 3 0 0 0 5 . 5 2 0 0 1 5 . 4 0 0 0 4 6 . 0 0 0 0 8 4 . 0 0 0 0 2 8 . 0 0 0 0 3 3 . 0 0 0 0
A 6 6 . 9 0 0 0 4 . 8 6 0 0 1 6 . 0 0 0 0 4 7 . 0 0 0 0 9 7 . 0 0 0 0 3 3 . 0 0 0 0 3 4 . 0 0 0 0
A 7 7 . 8 0 0 0 4 . 6 8 0 0 1 4 . 7 0 0 0 4 3 . 0 0 0 0 9 2 . 0 0 0 0 3 1 . 0 0 0 0 3 4 . 0 0 0 0
A 8 8 . 6 0 0 0 4 . 8 2 0 0 1 5 . 8 0 0 0 4 2 . 0 0 0 0 8 8 . 0 0 0 0 3 3 . 0 0 0 0 3 7 . 0 0 0 0
A 9 5 . 1 0 0 0 4 . 7 1 0 0 1 4 . 0 0 0 0 4 3 . 0 0 0 0 9 2 . 0 0 0 0 3 0 . 0 0 0 0 3 2 . 0 0 0 0
0 10 20 30 40 50 600100200300400500600700800900
1000
measurement
Val
ue
Measurement
0 10 20 30 40 50 60 700
0.20.40.60.811.21.41.61.8
Person
H-B
ands
0 50 150 250 350 45050100150200250300350400450500550
C-Triglycerides
C-L
DH
0100
200300
400500
0200
4006000
1
2
3
4
C-TriglyceridesC-LDH
M-E
PI
Univariate Bivariate
Trivariate
Data Presentation
Data Presentation
Better presentation than ordinate axes? Do we need a 53 dimension space to view
data? How to find the ‘best’ low dimension
space that conveys maximum useful information?
One answer: Find “Principal Components”
Principal Components
First PC is direction of maximum variance from origin
Subsequent PCs are orthogonal to 1st PC and describe maximum residual variance
0 5 10 15 20 25 300
5
10
15
20
25
30
Wavelength 1
Wa
vele
ng
th 2
0 5 10 15 20 25 300
5
10
15
20
25
30
Wavelength 1
Wa
vele
ng
th 2
PC 1
PC 2
We wish to explain/summarize the underlying variance-covariance structure of a large set of variables through a few linear combinations of these variables.
The Goal
Applications
Uses: Data Visualization Data Reduction Data Classification Trend Analysis Factor Analysis Noise Reduction
Examples: How many unique “sub-sets” are
in the sample? How are they similar / different? What are the underlying factors
that influence the samples? Which time / temporal trends are
(anti)correlated? Which measurements are needed
to differentiate? How to best present what is
“interesting”? Which “sub-set” does this new
sample rightfully belong?
This is accomplished by rotating the axes.
X1
X2
Trick: Rotate Coordinate Axes
Suppose we have a population measured on p random variables X1,…,Xp. Note that these random variables represent the p-axes of the Cartesian coordinate system in which the population resides. Our goal is to develop a new set of p axes (linear combinations of the original p axes) in the directions of greatest variability:
PCA: General
From k original variables: x1,x2,...,xk:
Produce k new variables: y1,y2,...,yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...yk = ak1x1 + ak2x2 + ... + akkxk
PCA: GeneralFrom k original variables: x1,x2,...,xk:
Produce k new variables: y1,y2,...,yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...yk = ak1x1 + ak2x2 + ... + akkxk
such that:
yk's are uncorrelated (orthogonal)
y1 explains as much as possible of original variance in data set
y2 explains as much as possible of remaining variance
etc.
4.0 4.5 5.0 5.5 6.02
3
4
5
1st Principal Component, y1
2nd Principal Component, y2
PCA Scores
4.0 4.5 5.0 5.5 6.02
3
4
5
xi2
xi1
yi,1 yi,2
PCA Eigenvalues
4.0 4.5 5.0 5.5 6.02
3
4
5
λ1λ2
PCA: Another Explanation
From k original variables: x1,x2,...,xk:
Produce k new variables: y1,y2,...,yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...yk = ak1x1 + ak2x2 + ... + akkxk
such that:
yk's are uncorrelated (orthogonal)
y1 explains as much as possible of original variance in data set
y2 explains as much as possible of remaining variance
etc.
yk's arePrincipal Components
Example Apple Samsung Nokia
student1 1 2 1student2 4 2 13student3 7 8 1student4 8 4 5
Mean545
-mean Apple Samsung Nokiastudent1 -4 -2 -4student2 -1 -2 8student3 2 4 -4student4 3 0 0
Example Apple Samsung Nokia
student1 1 2 1student2 4 2 13student3 7 8 1student4 8 4 5
Mean545
-mean Apple Samsung Nokiastudent1 -4 -2 -4student2 -1 -2 8student3 2 4 -4student4 3 0 0
Sample co-variance matrix: S=B’B/(n-1) =
PCA of B finding eigenvalues and eigenvectors of S: S=VDV’
V= D=
1st principle component: [-0.0740, -0.3030, 0.9501]’, y1 = -0.0740A-0.3030S+0.9501N2nd principle component: [0.8193 0.5247 0.2312]’, y2 …3rd principle component: [0.5686 -0.7955 -0.2094]’, y3 …
Total variance = tr(D) = 50 = tr(S)
10 6 06 8 -80 -8 32
0.5686 0.8193 -0.0740-0.7955 0.5247 -0.3030-0.2094 0.2312 0.9501
1.6057 0 00 13.843 00 0 34.5513
Example Apple Samsung Nokia
student1 1 2 1student2 4 2 13student3 7 8 1student4 8 4 5
Mean545
-mean Apple Samsung Nokiastudent1 -4 -2 -4student2 -1 -2 8student3 2 4 -4student4 3 0 0
Sample co-variance matrix: S=B’B/(n-1) =
PCA of B finding eigenvalues and eigenvectors of S: S=VDV’
V= D=
1st principle component: [-0.0740, -0.3030, 0.9501], y1 = -0.0740A-0.3030S+0.9501N2nd principle component: [0.8193 0.5247 0.2312], y2 …3rd principle component: [0.5686 -0.7955 -0.2094], y3 …
Total variance = tr(D) = 50 = tr(S)
10 6 06 8 -80 -8 32
0.5686 0.8193 -0.0740-0.7955 0.5247 -0.3030-0.2094 0.2312 0.9501
1.6057 0 00 13.843 00 0 34.5513
New view of the data y1 = -0.0740A-0.3030S+0.9501N, … C = BV =
C y3 y2 y1student1 0.1542 -5.2514 -2.8984student2 -0.6528 -0.0191 8.2808student3 -1.2072 2.8126 -5.1604student4 1.7058 2.4579 -0.222
y1
y2
x1
x2
Apple Samsung Nokiastudent1 1 2 1student2 4 2 13student3 7 8 1student4 8 4 5
Another look of the DataVar^-1*B A S Nstudent1 1.7321 3.4641 1.7321student2 0.6827 0.3413 2.2186student3 1.8489 2.1131 0.2641student4 3.8431 1.9215 2.4019
B Apple Samsung Nokiastudent1 1 2 1student2 4 2 13student3 7 8 1student4 8 4 5
Make all variances = 1
-mean A S Nstudent1 -0.2946 1.5041 0.077925student2 -1.344 -1.6187 0.564425student3 -0.1778 0.1531 -1.39008student4 1.8164 -0.0385 0.747725
S=VDV’, lambda=[0.5873; 1.4916; 2.2370];C y3 y2 y1student1 0.8888 0.9583 -0.8043student2 0.2289 -0.5312 2.1001student3 -0.9428 1.048 -0.01student4 -0.1749 -1.4751 -1.2858
y1
y2