the singular value decomposition - duke university...• the singular value decomposition (svd)...

The Singular Value Decomposition

COMPSCI 527 — Computer Vision

COMPSCI 527 — Computer Vision The Singular Value Decomposition 1 / 21

Outline

1 Math Corners and the SVD: Motivation

2 Orthogonal Matrices

3 Orthogonal Projection

4 The Singular Value Decomposition

5 Principal Component Analysis


Math Corners and the SVD: Motivation

Math Corners and the SVD: Motivation• A few math installments to get ready for later technical

topics are sprinkled throughout the course• The Singular Value Decomposition (SVD) gives the most

complete geometric picture of a linear mapping• SVD yields orthonormal vector bases for the null space, the

row space, the range, and the left null space of a matrix• SVD leads to the pseudo-inverse, a way to give a linear

system a unique and stable approximate solution• SVD leads to principal component analysis, a technique to

reduce the dimensionality of a set of vector data whileretaining as much information as possible

• Dimensionality reduction improves the ability of machinelearning methods to generalize


Orthogonal Matrices

Why Orthonormal Bases are Useful

• n-dim linear space S ⊆ Rm (so n ≤ m)• p = [p1, . . . ,pm]T ∈ S (standard basis)• v1, . . . ,vn: an orthonormal basis for S:

vTi vj = δij (Ricci delta)

• Then there exists q = [q1, . . . ,qn]T s.t.p = q1v1 + . . . + qnvn

• Matrix form: p = Vq whereV = [v1, . . . ,vn] ∈ Rm×n

vTi p =


Orthogonal Matrices

The Left Inverse of an Orthogonal Matrix

qi = vTi p (Finding coefficients qi is easy!)

• Rewrite vTi vj = δij in matrix form

V = [v1, . . . ,vn] ∈ Rm×n

V T V =

• LV = I with L = V T


Orthogonal Matrices

Left and Right Inverse of an Orthogonal Matrix

• LV = I with L = V T

• Can we have R such that VR = I?• That would be the right inverse• What if m = n?


Orthogonal Matrices

Orthogonal Transformations Preserve Norm (m ≥ n)

y = Vx : Rn → Rm

‖y‖2 =


Orthogonal Projection

Projection Onto a Subspace (m ≤ n)• Projection of b ∈ Rm onto subspace S ⊆ Rm

is the point p ∈ S closest to b• Let V ∈ Rm×n an orthonormal basis for S

(That is, V is an orthogonal matrix)• b− p ⊥ vi for i = 1, . . . ,n

that is, V T (b− p) = 0• Projection of b ∈ Rm onto S is p = VV T b

(optional proofs in an Appendix)



Linear Mappings

b = Ax : Rn → Rm. Example: A =1√

2

√3√

3 0−3 3 01 1 0

(m = n = 3)

range(A)↔ rowspace(A)



The Singular Value Decomposition: Geometry

b = Ax where A =1√

2

√3√

3 0−3 3 01 1 0



The Singular Value Decomposition: AlgebraAv1 = σ1u1

Av2 = σ2u2

σ1 ≥ σ2 > σ3 = 0

uT1 u1 = 1

uT2 u2 = 1

uT3 u3 = 1

uT1 u2 = 0

uT1 u3 = 0

uT2 u3 = 0

vT1 v1 = 1

vT2 v2 = 1

vT3 v3 = 1

vT1 v2 = 0

vT1 v3 = 0

vT2 v3 = 0



The Singular Value Decomposition: General

For any real m × n matrix A there exist orthogonal matrices

U =[

u1 · · · um]∈ Rm×m

V =[

v1 · · · vn]∈ Rn×n

such that

UT AV = Σ = diag(σ1, . . . , σp) ∈ Rm×n

where p = min(m,n) and σ1 ≥ . . . ≥ σp ≥ 0. Equivalently,

A = UΣV T .



Rank and the Four Subspaces

A = UΣV T = [u1, . . . ,ur ,ur+1, . . . ,um]

σ1. . .

σr0

. . .0

0 · · · · · · 0...

...0 · · · · · · 0

v1...

vrvr+1

...vn

[drawn for m > n]


Principal Component Analysis

Principal Component Analysis• We used the SVD to view a matrix as a map• We can also view a matrix as a data set• A is a m × n matrix with n data points in Rm




• Let k ≤ m be some “smaller dimensionality”• How to approximate the m-dimensional data in A with

points in k dimensions?(Data compression, dimensionality reduction)

• The columns in A are a cloud of points around the meanµ(A) = 1

nA1n

• Center the matrix: Ac = A− µ(A)1Tn




• How to approximate the m-dimensional centered cloud Ac

in k � m dimensions?• Ac = UΣV T =

[u1, . . . ,uk ,uk+1, . . . ,um]

σ1. . .

σkσk+1

. . .σmin(m,n)

v1...

vkvk+1

...vn

• Ac ≈ Uk ΣkV Tk = [u1, . . . ,uk ]

σ1. . .

σk

v1

...vk




• Ac ≈ Uk ΣkV Tk = [u1, . . . ,uk ]

σ1. . .

σk

v1

...vk

• Ac ≈ UkB = [b1, . . . ,bm]

• B = UTk Ac

• B = UTk [A− µ(A)1T

n ] (PCA)• B is k × n and captures most of the variance in A• See notes for a statistical interpretation (optional)• Reconstruct approximate original data:

A = Ac + µ(A)1Tn

• A ≈ UkB + µ(A)1Tn



PCA ExampleImage compression: Each column viewed as a data point

0 100 200 300 400 500 600 70010

0

101

102

103

104

105

m × n = 685× 1024 original Singular Values

k = 10 dimensions k = 50 dimensions



Encoding/Decoding New Points• Given PCA parameters µ(A), Uk for an m × n matrix A,

the compressed points are B = UTk [A− µ(A)1T

n ](a k × n matrix with k � m)

• The original points can be approximately reconstructed asA ≈ UkB + µ(A)1T

n

• Given a new point a ∈ Rm, it can be encoded asb = UT

k [a− µ(A)] (without incorporating a into the PCA)(b is a short, k -dimensional vector)

• The original a can be approximately reconstructed asa ≈ Ukb + µ(A)

• The parameters µ(A), Uk are a code, used for encodingand approximate decoding (compression/decompression)



PCA is Not the Final Answer

+ + + +++

++

+

___

___

_ ___

__

__ _


the singular value decomposition - duke university...• the singular value decomposition (svd)...

Documents