matrix decompositions for data analysis · gaussian elimination with pivoting: discovery of...

Matrix Decompositions for Data Analysis

Haesun Park

�

hpark

�

@cc.gatech.edu

Division of Computational Science and Engineering

College of Computing

Georgia Institute of Technology

Atlanta, GA 30332, USA

KAIST, Korea, June - July 2007

Schedule and Outline

Lecture 1 June 19: Introduction to Matrix Decompositions

Lecture 2 June 21: Dimension Reduction for Undersampled HighDimensional Data

Lecture 3 June 26: Adaptive Methods for Linear Discriminant Analysis andKernelized Discriminant Analysis

Lecture 4 June 28: Nonnegative Matrix Factorization

Lecture 5 July 3: Nonnegative Matrix Factorization and its Applications

Lecture 1, Recommended Text Books:Matrix Computations 3/e by Golub & Van Loan, Johns Hopkins, 1996Applied Numerical Linear Algebra by J.W. Demmel, SIAM, 1997Lloyd N. Trefethen and David Bau, Numerical Linear Algebra, SIAM, 1997

Matrix Decompositions for Data Analysis – p.1/30

Dimension Reduction

Matrix Decompositions play important rolesin Dimension Reduction Algorithms

Unsupervised Dimension ReductionSVD (LSI, PCA)Manifold LearningDistance Preserving Dimension Reduction (DPDR)

Dimesion Reduction for Non-negative DataNon-negative Matrix Factorization (NMF)

Dimension Reduction for Clustered Data and ClassificationLinear Discriminant Analysis, LDA/GSVD, Regularized LDA, ...Orthogonal Centroid MethodCentroid-based Method


What is Numerical Analysis?

Three great brances of science:Theory, Experiment, and Computation

The purpose of computing is insight, not numbers (1961 Hamming).The purpose of computing numbers is not yet in sight (1997 Hamming).

Numerical Analysis isStudy of Algorithms for Problems of Continuous Mathematics.Ex. Newton’s method, Lagrange interpolation polynomial, GaussianElimination, Euler’s method...

Computational mathematics is mainly based on two ideas(extreme simplification) Taylor series and linear algebra.

Role of Computers in Numerical ComputingComputers certainly play a part in numerical computing but even if roundingerror vanishes, 95 % of numerical analysis would remain.Most mathematical problems cannot be solvedby a finite sequence of elementary operationsNeed: fast algorithms that converge to ’approximate’ answers accurate tomany digits of precision, in science and engineering applications.


Different Types of Problems in Numerical Computing

Problem F: can be solved in a finite sequence of elementary operations:Root for polynomial of degree 4: closed form formula exists (Ferrari 1540)Solving linear equationsLinear programming

Problem I: cannot be solved in a finite sequence of elementary operations:Root for polynomial of degree 5 and higher: no closed form formulaexisits (Ruffini and Abel, around 1800)Finding eigenvalues of an � � � matrix with � � �

Minimize a function of several variablesEvaluate an integralsolve an ODEsolve a PDE

Problem F is not necessarily easier than Problem IWhen problem dimension is very high, one often ignores the exact solutionand uses approximate and fast methods instead.*** World’s largest matrix computation as of April 2007:Google’s PageRank - eigenvector of a matrix of order 2.7 billion


Gauss (1777-1855) and Numerical Computing

least squares data fitting (1795)

systems of linear equations (1809)

numerical quadrature (1814)

fast Fourier transform (1805) - not well known until it was rediscovered byCooley and Tukey (1965)

Numerical Linear Algebra

square linear system solving

least squares problems

eigenvalue problem

Often, Algorithms = Matrix Factorizations


Square Linear System Solving

When does the solution exist?

When is it easy to solve? if the matrix is triangular or diagonal

Diagonalization: expensive

Make it triangular: A = LU: lower - upper triangular factors

Gaussian elimination: make the problem into triangular system solving

May break down:

� ��

� � .

Even for matrices with the LU factorization, it can be unstable.

Pivoting: By interchanging rows, stability can be achieved.Gaussian elimination with pivoting:

� � � ��

Discovery of pivoting was easy but its theoretical analysis has been hard.For most matrices, it is stable but in 1960 Wilkinson and others found thatfor certain exceptional matrices, Gaussian elimination with pivoting isunstable.


Orthogonal (Unitary) TransformationsUse of orthogonal matrices was introduced in late 1950’s

� � � � � �

with

��

or

� � � � � �

with

��

QR factorization: For any matrix

�� , � � �, a QR factorization of

�

exists,

� � � �� where

� � � � has orthonormal colummns and

� � � � is upper triangular. Reduced QRD:

� � � � �where

� ��

Gram(1883)-Schmidt(1907) orthogonalization: column of

�

are obtainedand it gets

�

as a by product in the process of triangular orthogonalization

Modified Gram-Schmidt (Laplace 1816, Rice 1966)

Householder method (1958, Householder reflector, Turnbull and Aitken1932): A is reduced to an upper triangular matrix R via orthogonaloperations. More stable numerically, because orthogonal operationspreserve

�� and Frobenius norms and thus do not amplify the roundingerrors introduced at each step:

� � ��

, � � � �

Givens method: extension of 2x2 plane rotations

plane rotations:

� �

� � �! " � # � "

# � " �! "


Important matrix computation algorithms in 1960s

Based on the QR factorization:

to solve the least squares

construct orthonormal bases

used at the core of other algorithms especially in EVD and SVD algorithms


Least SquaresOverdetermined system solving

��

where

�� with � � �

If a square system, we know how to solve: normal equations, or QRD

Reduced QR Decomposition: distance preserving dimension reductionmethod

QRD: efficient updating and downdating methods exist

Rank Deficiency: Q is not the basis for range(R) if rank(A) is not full

Pivoted QR decomposition, Rank Revealing QR decomposition or the SVDis needed if rank(A) is not full


Householder Transformation

� � � � � ��

� �� where � � � � � � �

� � � �

,

� � � � �

,

��

Given � � � � � � � � �

, we can find � so that

��

� � � � � � � � � ��

��

�

... ��

��

� � � � � � � � � �


Examples

Given � ��

��

��

��

�� , with � ��

��

��

��

��

��

� � ��

��

��

��

��

��

�

�� ,

��

� ��

� � � � ��

� ��

��

��

�

��

Given � ��

��

��

��

�� , with � ��

��

��

��

�� ,��

��

� ��

��

�� where

� � ��

� ��

Which sign to choose in � � � � � � � � � �?Choose

�

if � � � �

and Choose � if � � �

to avoid cancellation error .

Cancellation error occurs when two number of very close values areinvolved in the subtraction.e.g.

� � � �

,� � �

. � � ��

� � � � � � ��

,

� � ��

� � � � � � ��

,

� � � � ��

� � � � � � ��

� � � � � � ��

.Matrix Decompositions for Data Analysis – p.11/30

Householder QRDLet

� � � � � �

Find a Householder matrix

� � � � � � �

so that

� ��

��

��

� � ��

��

��

�

��

�� .

Then

� � � ��

��

� ��

� ��

��

��

Find a Householder matrix

� ��

so that

� ��

��

� ��

� ��

��

��

�

��

�� .

Define

��

� ��


Householder QRD - cont’d

��

� ��

��

� � � �

� ��

� ��

� ��

��

��

��

� � �

� � ��

� � ��

��

continue...

Letting

� � � �� , we have

� � � � � � � � � � �

In general, we need to find

�

matrices � times to find QRD of

� � � � � �

for � �.

� � � � � ��

��

� � � � � �� is orthogonal since product of orthogonal matrices is orthogonal.


Householder QRD Algorithm� � ��

for

� � � �

determine a Householder matrix

� �� of order ��

so that

� ��

��

� � �

...

� ��

��

��

��

�... �

��

� � � # ��

� � � # ��

end


Givens Rotation

�� " � ��

�� " � � �� "

� �� " �� "�

��

��

�-dim Givens rotation:

�� " � ��

��

�� " � � �� ". . .

� �� " � � � "

��

��

�� " ��

��

...

� �

...

� � �

��

��

��

...

� �

...

� � �

��

, i.e.

� � ��

�

��

�� " � � �� "

� �� " �� "� �

� ��

�


Givens Method for QRD� � � � � �

with � � �.

��

��

� � �

� � �

� � �

��

��

� � �

� ��

��

1.

� � � � � � � � � � ��

��

� � �

� � �

� � �

��

��

� � �

� � �

� � �

� � �

��

2.

� � ��

� ��

��

� � �

� � �

� � �

��

��

� � �

� � �

� � �

� � �

��

3.

� � � ��

��

��

� � �

� � �

� � �

��

��

� � �

� � �

� � �

� � ��

��

� ��

�


QRD by Gram-SchmidtIn the

�

-th step,

�

-th column of

� �and the

�

-th column of

�

are computed.

�

� � � � � � ��

��

� � � � � � ��

��

� � � � � � � � �

. . ....

� � ��

��

� � ��

� � � � � � ��

��

� � ��

... ��

�� and � � � � � � � � �

��

� � � � � � ��

��

� � ��

�... �

��

� � �� ,

� � �

��

��

��

��


QRD by Modified Gram-SchmidtIn the

�

-th step,

�

-th column of

� � and the

�

-th row of

�

are computed.

�

� � � � � � ��

��

� � � � � � ��

��

� ��

...

� ��

��

��

�� , where � �� is the

#

-th row of

�

.

with CGS & MGS, we can only get reduced QRD, no full QRD.

For � � �

, MGS=CGS.

MGS is numerically more stable than CGS.


Comparison between QRD Algorithms

Householder and Givens method generate

�

that is numerically moreorthogonal than

�

generated by MGS and CGS

��

�� or

��

��

�� or

��

� � � � ��

where

� � : Householder,

� � : Givens,

� � :CGS,

� � :MGS.

Householder and Givens are more expensive if we want

� � and

�

.


Singular Value Decomposition (SVD)

Beltrami, Jordan, Sylvester, in the late 19th centurymade well known by Golub 1965

The SVD: Any matrix

� � � � � �

(assume � � � but not necessary) canbe decomposed into

� � � � � �

where

� � � � � �

is unitary,

� � � � � �

isunitary, and

� � � � � � �

is diagonal,

� � � # �� with

� � � ��

� � � � �

. If ��

then � � � �� .

��

��

�� , � � � � ��

is the best rank � approximation of

�

singular values of

�

are the nonnegative square roots of the eigenvalues of� � ��

and

� � � � � � � � � �

Latent Semantic Indexing:lower rank approximation of the term-document matrix

Principal Component Analysis:Let

� � ��

.Then the leading singular vectors are the PCA solutions. Let the SVD of

� � � � � �

. Then� � � � � � � � � �

So

� � is the

�

principal vectors

But SVD is expensive.


Properties of SVDSuppose for matrix

� � � � � �� we have its SVD:

� � � � � � ��

� � ��

� � � ��

�

where

� � � � # ��

with � � � ��

� � � � � � � � � � � � � � � � � � � �

where

� � � � � # ��

� � � � � � � � � � � � � � � � � � � �

where

� � � � � # ��

If

� � � # �� # ��

, i.e.

� � � �� , then � � � ��

� � � � �� , ��

� � � � �� , � � � ��


Why QRD with Column Pivoting?

Ex.

� ��

��

��

�� . Then � � � ��

.

Consider QRD of

� � � � ��

��

��

��

��

� � ��

��

� .

� � � ��

, so

� � � � ��

for some

#

, and

�

?

QRD with C.P. can help us to maintain � � � � � � � � � � � � � � � � � � ��

in rank deficient case

For any

� � � � � �

, QRD with Column Pivoting computes

� � � ��

� ��

where

� � � is upper-triangular matrix.Then � � � � � �� .� � � � � �

: orthogonal and

� � � � � �

:permutation.


Computing QRD with C.P.�

: computed by Householder matrices and

�

: permutation.Assume we are at the

�

th stage, we have

� � � � � � � � � � � � � � � � � � � � � � � � � � ��

� � � � � � � �

��

� �

�

,

where

� � � � � �

� � is upper-triangular,

� � � � � �

� � � � � � � � �

� � are not.

Let

� � � � � �

� � ��

� � � � � � � � � �

�

�

.

Determine the index �,

� � �, s.t.

��

� � � � �

�

��

� ��

� � � � �

�

��

��

� � � � �

�

��

�

.If � � � ��

, then � ��

, done;else

� � is permutation to interchange the column � and column

�

.

� � is Householder matrix s.t.� � � � � � � � � � � � � � � has

��

��

� � � � �

...

��

��

��

.

We only need to compute

� � � � � � � � � � � � � � � at the beginning, which takes� � � flops.


QRD with Column Pivoting Algorithm

� � �� % for permutation

� ��

��

�� % column norm squared

for

� � � �

determine �� so

��

if

��

, then quitinterchange � � and ��

interchange

� � and

��

interchange �� and �� for

# � ��

Define Householder

� �� so that � �

��

� � �

...

� � �

��

��

�

... ��

��

� � � # ��

� � � # ��

% if

�

is needed

� � � � �� ,� � � � ��

end Matrix Decompositions for Data Analysis – p.24/30

Eigenvalue problem

Symmetric vs. Non-symmetric

For which matrices eigenvalues are easy to find? diagonal or triangular

What transformations allowed? Similarity transformations. Two matrices

�

and

�

are similar if

� � � � � � �

for a nonsingular matrix

�

. Then thecharacteristic polynomials of

�

and

�

are the same.

Schur Decomposition Theorem:For

� � � � � �

, there is a unitary matrix

� � � � � �s.t.

� � � � � �

, where

�

is upper triangular.

A matrix is normal iff

� � � � � � �

.

Corollary:

� � � � � �

is normal iff there is unitary

�

s.t.

� � � � � � # ��


Engenvalue problem - cont’d

Real Schur Decomposition TheoremIf

� � � � � �

, there is an orthogonal matrix

� � � � � �

s.t.

� � � � ��

��

� � �

� � �

...

��

��

where

�� is either

� � �

or

� � �

.

Corollary: If

� � � � � � � � �

, then� � � �

is also symmetric, so in the realSchur decomposition, R is diagonal.

If

� � � �

, then all eivengalues of

�

are real.


Algorithms for Symmetric Eigenvalue ProblemsJacobi algorithm

QR algorithm (1960)One of the matrix factorization algorithms with the greatest impactFrancis, Kublanovskaya, WilkinsonIterative, in symmetric case, typically it converges cubicallyEISPACK, LAPACK, NAG, IMSL, Numerical recipes, MATLAB, Maple,Mathematica

Power method: to find the leading eigenvalue, eigenvector, PageRank


Generalized Singular Value Decomposition

Symmetric-definite pencils:

�� where� � � � � �

is symmetric and

� � � � � �

is symmetric positive definiteProperty preserved under congruence transformations:��

is symmetric-definite iff

� � � � � �� is symmetric-definite.

GSVD (Van Loan 76):for matrices

� � � � � �

with � � � and

� � �� ,

there exist orthogonal matrices

� � � � � �

and� � ��

and a nonsingular matrix

� � � � � �

s.t.� � � � � � � � # �� and

� � � � � � � � # �� and

where � � � # �� .


GSVD - Paige and SaundersSuppose

� � � � � �

and

� � ��

are given.Then there exist orthogonal matrices

� � � � � �

,

� � ��

,and a nonsingular matrix

� � � � � �

such that

� � � � ��

and

� � � � �� where

��

��

��

��

��

��

��

��

��

��

��

��

� � � �

and

��

are identity matrices,

� � ��

��

��

�� ! � � � � ��

��

��

� (1)

" � � �$# � � � � � ��

and and

" � � � �$% � � ! � �

are zero matrices with possibly no rows or no columns, and

& � '( ) �$* ! + , , , * ! � � and

& � � '( ) �- ! + , , , - ! � �

satisfy +. * ! + /10 0 0 / * ! � . " "2 - ! + 30 0 0 3 - ! � 2 + (2)

and * 4( ! - 4( � +

for

( � ! + , , , ! �, 5


G. Singular Values and Eigenvalues

� � � � � � �

� ��

� � and

� � � � � � �

� ��

� �

Defining � � � ��

for

# � ��

and � � � ��

for

# � � � � ��

we have,

�� for

� # �

.

� ( �( �( � * (- ( � ( belongs to

� �� 1 0 null

�� null

��

� � �� ( � � �� ( � � � �( � �

null

�� null

��

��

0 1 0 null

�� null

��

� � � �� any value any value any value null

�� null

��

where

��

and

��


matrix decompositions for data analysis · gaussian elimination with pivoting: discovery of...

Documents