matrix decompositions for data analysis · gaussian elimination with pivoting: discovery of...
TRANSCRIPT
Matrix Decompositions for Data Analysis
Haesun Park
�
hpark
�
@cc.gatech.edu
Division of Computational Science and Engineering
College of Computing
Georgia Institute of Technology
Atlanta, GA 30332, USA
KAIST, Korea, June - July 2007
Schedule and Outline
Lecture 1 June 19: Introduction to Matrix Decompositions
Lecture 2 June 21: Dimension Reduction for Undersampled HighDimensional Data
Lecture 3 June 26: Adaptive Methods for Linear Discriminant Analysis andKernelized Discriminant Analysis
Lecture 4 June 28: Nonnegative Matrix Factorization
Lecture 5 July 3: Nonnegative Matrix Factorization and its Applications
Lecture 1, Recommended Text Books:Matrix Computations 3/e by Golub & Van Loan, Johns Hopkins, 1996Applied Numerical Linear Algebra by J.W. Demmel, SIAM, 1997Lloyd N. Trefethen and David Bau, Numerical Linear Algebra, SIAM, 1997
Matrix Decompositions for Data Analysis – p.1/30
Dimension Reduction
Matrix Decompositions play important rolesin Dimension Reduction Algorithms
Unsupervised Dimension ReductionSVD (LSI, PCA)Manifold LearningDistance Preserving Dimension Reduction (DPDR)
Dimesion Reduction for Non-negative DataNon-negative Matrix Factorization (NMF)
Dimension Reduction for Clustered Data and ClassificationLinear Discriminant Analysis, LDA/GSVD, Regularized LDA, ...Orthogonal Centroid MethodCentroid-based Method
Matrix Decompositions for Data Analysis – p.2/30
What is Numerical Analysis?
Three great brances of science:Theory, Experiment, and Computation
The purpose of computing is insight, not numbers (1961 Hamming).The purpose of computing numbers is not yet in sight (1997 Hamming).
Numerical Analysis isStudy of Algorithms for Problems of Continuous Mathematics.Ex. Newton’s method, Lagrange interpolation polynomial, GaussianElimination, Euler’s method...
Computational mathematics is mainly based on two ideas(extreme simplification) Taylor series and linear algebra.
Role of Computers in Numerical ComputingComputers certainly play a part in numerical computing but even if roundingerror vanishes, 95 % of numerical analysis would remain.Most mathematical problems cannot be solvedby a finite sequence of elementary operationsNeed: fast algorithms that converge to ’approximate’ answers accurate tomany digits of precision, in science and engineering applications.
Matrix Decompositions for Data Analysis – p.3/30
Different Types of Problems in Numerical Computing
Problem F: can be solved in a finite sequence of elementary operations:Root for polynomial of degree 4: closed form formula exists (Ferrari 1540)Solving linear equationsLinear programming
Problem I: cannot be solved in a finite sequence of elementary operations:Root for polynomial of degree 5 and higher: no closed form formulaexisits (Ruffini and Abel, around 1800)Finding eigenvalues of an � � � matrix with � � �
Minimize a function of several variablesEvaluate an integralsolve an ODEsolve a PDE
Problem F is not necessarily easier than Problem IWhen problem dimension is very high, one often ignores the exact solutionand uses approximate and fast methods instead.*** World’s largest matrix computation as of April 2007:Google’s PageRank - eigenvector of a matrix of order 2.7 billion
Matrix Decompositions for Data Analysis – p.4/30
Gauss (1777-1855) and Numerical Computing
least squares data fitting (1795)
systems of linear equations (1809)
numerical quadrature (1814)
fast Fourier transform (1805) - not well known until it was rediscovered byCooley and Tukey (1965)
Numerical Linear Algebra
square linear system solving
least squares problems
eigenvalue problem
Often, Algorithms = Matrix Factorizations
Matrix Decompositions for Data Analysis – p.5/30
Square Linear System Solving
When does the solution exist?
When is it easy to solve? if the matrix is triangular or diagonal
Diagonalization: expensive
Make it triangular: A = LU: lower - upper triangular factors
Gaussian elimination: make the problem into triangular system solving
May break down:
� �� �
� � .
Even for matrices with the LU factorization, it can be unstable.
Pivoting: By interchanging rows, stability can be achieved.Gaussian elimination with pivoting:
� � � ��
Discovery of pivoting was easy but its theoretical analysis has been hard.For most matrices, it is stable but in 1960 Wilkinson and others found thatfor certain exceptional matrices, Gaussian elimination with pivoting isunstable.
Matrix Decompositions for Data Analysis – p.6/30
Orthogonal (Unitary) TransformationsUse of orthogonal matrices was introduced in late 1950’s
� � � � � �
with
�� � � � �
or
� � � � � �
with
�� � � � �
QR factorization: For any matrix
�� � � �, � � �, a QR factorization of
�
exists,
� � � �� where
� � � � has orthonormal colummns and
� � � � is upper triangular. Reduced QRD:
� � � � �where
� �� � ��� ����
Gram(1883)-Schmidt(1907) orthogonalization: column of
�
are obtainedand it gets
�
as a by product in the process of triangular orthogonalization
Modified Gram-Schmidt (Laplace 1816, Rice 1966)
Householder method (1958, Householder reflector, Turnbull and Aitken1932): A is reduced to an upper triangular matrix R via orthogonaloperations. More stable numerically, because orthogonal operationspreserve
�� and Frobenius norms and thus do not amplify the roundingerrors introduced at each step:
� � ��� ��� � ��� � � � �
, � � � �
Givens method: extension of 2x2 plane rotations
plane rotations:
� �
� � �! " � # � "
# � " �! "
Matrix Decompositions for Data Analysis – p.7/30
Important matrix computation algorithms in 1960s
Based on the QR factorization:
to solve the least squares
construct orthonormal bases
used at the core of other algorithms especially in EVD and SVD algorithms
Matrix Decompositions for Data Analysis – p.8/30
Least SquaresOverdetermined system solving
��� � �
where
�� � � � with � � �
If a square system, we know how to solve: normal equations, or QRD
Reduced QR Decomposition: distance preserving dimension reductionmethod
QRD: efficient updating and downdating methods exist
Rank Deficiency: Q is not the basis for range(R) if rank(A) is not full
Pivoted QR decomposition, Rank Revealing QR decomposition or the SVDis needed if rank(A) is not full
Matrix Decompositions for Data Analysis – p.9/30
Householder Transformation
� � � � � ��� � �
� �� where � � � � � � �
� � � �
,
� � � � �
,
�� � �
Given � � � � � � � � �
, we can find � so that
�� � � � ���� � �
� � � � � � � � � ��
��������
�
... ��
��������
� � � � � � � � � �
Matrix Decompositions for Data Analysis – p.10/30
Examples
Given � ��
�������
��
��
������ , with � ��
�������
��
��
������ �
�������
� � ��
��
�������
��
������� � � � �
��
�
������� ,
�� �� ��� �� � �
� ���
� � � � ��� � ���
� �� ��
������ � � �
��
�
������
Given � ��
������
��
��
����� , with � ��
������
��
��
����� ,�� �
������
� ��
��
������ where
� � �� � �� � �
� ��
Which sign to choose in � � � � � � � � � �?Choose
�
if � � � �
and Choose � if � � �
to avoid cancellation error .
Cancellation error occurs when two number of very close values areinvolved in the subtraction.e.g.
� � � �
,� � �
. � � ��
� � � � � � ��
,
� � ��
� � � � � � ��
,
� � � � ��
� � � � � � �� � � ��
� � � � � � �� �
.Matrix Decompositions for Data Analysis – p.11/30
Householder QRDLet
� � � � � �
Find a Householder matrix
� � � � � � �
so that
� ��
������� � �
�� ��� �
� � ��
������ �
�������
�
��
������� .
Then
� � � ��
������� � � �
� �� � � �
� �� � � �
��� � � �
�������
Find a Householder matrix
� �� � �� � �
so that
� ���
��� �� �
� �� �
� �� �
��� �
���
�
��
�� .
Define
�� �� �
� ���
Matrix Decompositions for Data Analysis – p.12/30
Householder QRD - cont’d
��� � � � � �� �
� ���
�������
� � � �
� �� � � �
� �� � � �
� �� � � �
�������
��
������� � � �
� � �
� � �� � �
� � �� � �
�������
continue...
Letting
� � � �� �� � � , we have
� � � � � � � � � � �
In general, we need to find
�
matrices � times to find QRD of
� � � � � �
for � �.
� � � � � �� � � � �� �
��
� � � � � �� � � is orthogonal since product of orthogonal matrices is orthogonal.
Matrix Decompositions for Data Analysis – p.13/30
Householder QRD Algorithm� � ��
for
� � � �
determine a Householder matrix
� ��� of order �� � � �
so that
� ����
��������
� � �
...
� �� ��� �� � �
���������
��
��������� � �
�... �
���������
� � � # ��� � � � � �� � � � �
� � � # ��� � � � � �� �� � �
end
Matrix Decompositions for Data Analysis – p.14/30
Givens Rotation
�� " � ��
�� � " � � ��� "
� ��� " �� � "�
�� � �
��
�-dim Givens rotation:
��� � " � ��
�������������
�� � " � � ��� ". . .
� ��� " � � � "
��
������������
�� � " ��
������������������� �
...
� �
...
� � �
�������������������
��
������������������� �
...
� �
...
� � �
�������������������
, i.e.
� � ��
�
��
�� � " � � ��� "
� ��� " �� � "� �
� ��
�
Matrix Decompositions for Data Analysis – p.15/30
Givens Method for QRD� � � � � �
with � � �.
��� � � � � ��
������ � �
� � �
� � �
� � �
������ �
������
� � �
� ��
������
1.
� � � � � � � � � � ��
������� � �
� � �
� � �
� � �
������� �
�������
� � �
� � �
� � �
� � �
�������
2.
� � �� �
� �� � ��
������ � �
� � �
� � �
� � �
������ �
������
� � �
� � �
� � �
� � �
������
3.
� � � �� �
��
������ � �
� � �
� � �
� � �
������ �
������
� � �
� � �
� � �
� � ��
����� �
� ��
�
Matrix Decompositions for Data Analysis – p.16/30
QRD by Gram-SchmidtIn the
�
-th step,
�
-th column of
� �and the
�
-th column of
�
are computed.
�
� � � � � � ��
��
� � � � � � ��
�����
� � � � � � � � �
. . ....
� � ��
����
� � ��
� � � � � � ��
���������
� � ��
... ��
�������� � � � � � � � � � � � � � � � � � and � � � � � � � � �
�� ��
� � � � � � ��
������������
� � ��� �
�... �
������������
� � �� � � � �� � �� � � �� � �� � � � � � � ,
� � �
�� � � � �
�� �� � � � � �
�� � � � � � �� � �
�� �� � � �
Matrix Decompositions for Data Analysis – p.17/30
QRD by Modified Gram-SchmidtIn the
�
-th step,
�
-th column of
� � and the
�
-th row of
�
are computed.
�
� � � � � � ��
��
� � � � � � ��
���������
� ��� ��
...
� ���
�������� �
������ �
�� � �� , where � �� is the
#
-th row of
�
.
with CGS & MGS, we can only get reduced QRD, no full QRD.
For � � �
, MGS=CGS.
MGS is numerically more stable than CGS.
Matrix Decompositions for Data Analysis – p.18/30
Comparison between QRD Algorithms
Householder and Givens method generate
�
that is numerically moreorthogonal than
�
generated by MGS and CGS
��� � ��� � � � �
���� or
��� � ��� � � � �
���� � � � �
��� � ��� � � � � � ��� or
��� � ��� � � � � � ��� � � � ��� � �
� � � � ��� � �
where
� � : Householder,
� � : Givens,
� � :CGS,
� � :MGS.
Householder and Givens are more expensive if we want
� � and
�
.
Matrix Decompositions for Data Analysis – p.19/30
Singular Value Decomposition (SVD)
Beltrami, Jordan, Sylvester, in the late 19th centurymade well known by Golub 1965
The SVD: Any matrix
� � � � � �
(assume � � � but not necessary) canbe decomposed into
� � � � � �
where
� � � � � �
is unitary,
� � � � � �
isunitary, and
� � � � � � �
is diagonal,
� � � # ��� � � � � � � � � with
� � � �� �
� � � � �
. If ��� �� � � � �
then � � � �� � � � �.
��� � ��
��
�� , � � � � �� � �
is the best rank � approximation of
�
singular values of
�
are the nonnegative square roots of the eigenvalues of� � �� � � � � � � � � �
and
� � � � � � � � � �
Latent Semantic Indexing:lower rank approximation of the term-document matrix
Principal Component Analysis:Let
� � �� � � �
.Then the leading singular vectors are the PCA solutions. Let the SVD of
� � � � � �
. Then� � � � � � � � � �
So
� � is the
�
principal vectors
But SVD is expensive.
Matrix Decompositions for Data Analysis – p.20/30
Properties of SVDSuppose for matrix
� � � � � �� � � � �� we have its SVD:
� � � � � � �� � � � ��
� � ��
� � � ���
�
where
� � � � # ��� � � � � � � � � � �
with � � � �� � � � � � � � � �
� � � � � � � � � � � � � � � � � � � �
where
� � � � � # ��� �� � � � � � � �� � �
� � � � � � � � � � � � � � � � � � � �
where
� � � � � # ��� �� � � � � � � �� �� �� � � � � � �
If
� � � # ��� � � � � � � � � � � � � # ��� � � � � � � � �� � �� � � � � � �
, i.e.
� � � �� � � � � � �� �� � � � � � � � � � � �, then � � � �� � � � �
� � � � �� � � � � � �� � � � , ��� � �� � � � � � �� �� �
� � � � �� � � � � � � �� � � � , � � � �� � � � � � � �� � � �
Matrix Decompositions for Data Analysis – p.21/30
Why QRD with Column Pivoting?
Ex.
� ��
��� � �
��
��� . Then � � � �� � � � �
.
Consider QRD of
� � � � ��
��
��
��
��
� � ��
��
� .
� � � �� � � � �
, so
� � � � �� � � � � � � ��� � � �
for some
#
, and
�
?
QRD with C.P. can help us to maintain � � � � � � � � � � � � � � � � � � �� � � � � � � � � �
in rank deficient case
For any
� � � � � �
, QRD with Column Pivoting computes
� � � �� � � � � � �
� ��
where
� � � is upper-triangular matrix.Then � � � � � �� � � � � � � �� � � � � � � �� � � � � .� � � � � �
: orthogonal and
� � � � � �
:permutation.
Matrix Decompositions for Data Analysis – p.22/30
Computing QRD with C.P.�
: computed by Householder matrices and
�
: permutation.Assume we are at the
�
th stage, we have
� � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � �
� � � � � � � �
��� � � � � �
� �
�
,
where
� � � � � �
� � is upper-triangular,
� � � � � �
� � � � � � � � �
� � are not.
Let
� � � � � �
� � �� � � � � � �
� � � � � � � � � �
�
�
.
Determine the index �,
� � �, s.t.
���� �
� � � � �
�
���� � � � ��
� ���� �
� � � � �
�
���� � � � � � �
���� �
� � � � �
�
���� �
�
.If � � � �� � � � � � �
, then � �� � �
, done;else
� � is permutation to interchange the column � and column
�
.
� � is Householder matrix s.t.� � � � � � � � � � � � � � � has
����
�� � �
� � � � �
...
�� � �
�� �
���� � �
.
We only need to compute
� � � � � � � � � � � � � � � at the beginning, which takes� � � flops.
Matrix Decompositions for Data Analysis – p.23/30
QRD with Column Pivoting Algorithm
� � �� � � �� � � � � � � % for permutation
� ���
��� ��
�� � � � �� � � � � � � % column norm squared
for
� � � �
determine �� � � � � so
�� � � �� � � � �
if
�� � �
, then quitinterchange � � and ��
interchange
� � and
��
interchange �� � and �� � for
# � �� � � � � �
Define Householder
� ��� so that � �
����
� � �
...
� � �
���� �
��������
�
... ��
�������
� � � # ��� � � � � � � � � �
� � � # ��� � � � � � � � � �
% if
�
is needed
� � � � �� � ,� � � � �� � � � � �
end Matrix Decompositions for Data Analysis – p.24/30
Eigenvalue problem
Symmetric vs. Non-symmetric
For which matrices eigenvalues are easy to find? diagonal or triangular
What transformations allowed? Similarity transformations. Two matrices
�
and
�
are similar if
� � � � � � �
for a nonsingular matrix
�
. Then thecharacteristic polynomials of
�
and
�
are the same.
Schur Decomposition Theorem:For
� � � � � �
, there is a unitary matrix
� � � � � �s.t.
� � � � � �
, where
�
is upper triangular.
A matrix is normal iff
� � � � � � �
.
Corollary:
� � � � � �
is normal iff there is unitary
�
s.t.
� � � � � � # ��
Matrix Decompositions for Data Analysis – p.25/30
Engenvalue problem - cont’d
Real Schur Decomposition TheoremIf
� � � � � �
, there is an orthogonal matrix
� � � � � �
s.t.
� � � � ��
���������
� � �
� � �
...
�� �
����������
where
�� � is either
� � �
or
� � �
.
Corollary: If
� � � � � � � � �
, then� � � �
is also symmetric, so in the realSchur decomposition, R is diagonal.
If
� � � �
, then all eivengalues of
�
are real.
Matrix Decompositions for Data Analysis – p.26/30
Algorithms for Symmetric Eigenvalue ProblemsJacobi algorithm
QR algorithm (1960)One of the matrix factorization algorithms with the greatest impactFrancis, Kublanovskaya, WilkinsonIterative, in symmetric case, typically it converges cubicallyEISPACK, LAPACK, NAG, IMSL, Numerical recipes, MATLAB, Maple,Mathematica
Power method: to find the leading eigenvalue, eigenvector, PageRank
Matrix Decompositions for Data Analysis – p.27/30
Generalized Singular Value Decomposition
Symmetric-definite pencils:
��� � � � � where� � � � � �
is symmetric and
� � � � � �
is symmetric positive definiteProperty preserved under congruence transformations:�� � �
is symmetric-definite iff
� � � � � �� �� � � � � �is symmetric-definite.
GSVD (Van Loan 76):for matrices
� � � � � �
with � � � and
� � �� � �,
there exist orthogonal matrices
� � � � � �
and� � �� � �
and a nonsingular matrix
� � � � � �
s.t.� � � � � � � � # ��� � � � � � � � � � �� �� � �and
� � � � � � � � # ��� � � � � � � �� �� � � �and
where � � � # �� �� � � .
Matrix Decompositions for Data Analysis – p.28/30
GSVD - Paige and SaundersSuppose
� � � � � �
and
� � �� � �
are given.Then there exist orthogonal matrices
� � � � � �
,
� � �� � �
,and a nonsingular matrix
� � � � � �
such that
� � � � �� ��� � � �
and
� � � � �� ��� � � �where
���� � � �
����
��
��
���
��� � ���� � � �
����
��
��
���
��� � � � � � � �
��
� � � �
and
��� � � ���� � � � � ���� � � �
are identity matrices,
� � ��
��
��
�� � � � � �� � � � � � � �! � � � � �� � ��
��
��
� (1)
" � � �$# � � � � � �� � � � �
and and
" � � � �$% � � ! � �
are zero matrices with possibly no rows or no columns, and
& � '( ) �$* ! + , , , * ! � � and
& � � '( ) �- ! + , , , - ! � �
satisfy +. * ! + /10 0 0 / * ! � . " "2 - ! + 30 0 0 3 - ! � 2 + (2)
and * 4( ! - 4( � +
for
( � ! + , , , ! �, 5
Matrix Decompositions for Data Analysis – p.29/30
G. Singular Values and Eigenvalues
� � � � � � �
� �� ��� �
� � and
� � � � � � �
� �� �� �
� �
Defining � � � �� �� � �
for
# � �� � �
and � � � �� �� � �
for
# � � � � �� � ��
we have,
��� � � ��� � � �� � � � � � � for
� # �
.
� ( �( �( � * (- ( � ( belongs to
� �� � 1 0 null
�� �� null
�� � � �
� � �� � ��� �� � ( � � �� �( � � � �( � �
null
�� � � � null
�� � � �
��� � � �� ��
0 1 0 null
�� � � � null
�� � �
� � � �� � � any value any value any value null
�� �� null
�� � �
where
��� � � � �
and
�� � � � �
Matrix Decompositions for Data Analysis – p.30/30