linear algebra - github pages · linear algebra shan-hung wu [email protected] department of...

55
Linear Algebra Shan-Hung Wu [email protected] Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 1 / 26

Upload: dinhhanh

Post on 09-Feb-2019

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Linear Algebra

Shan-Hung [email protected]

Department of Computer Science,National Tsing Hua University, Taiwan

Large-Scale ML, Fall 2016

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 1 / 26

Page 2: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Outline

1Span & Linear Dependence

2Norms

3Eigendecomposition

4Singular Value Decomposition

5Traces and Determinant

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 2 / 26

Page 3: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Outline

1Span & Linear Dependence

2Norms

3Eigendecomposition

4Singular Value Decomposition

5Traces and Determinant

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 3 / 26

Page 4: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Matrix Representation of Linear Functions

A linear function (or map or transformation) f : Rn ! Rm can berepresented by a matrix A, A 2 Rm⇥n, such that

f (x) = Ax = y,8x 2 Rn,y 2 Rm

span(A:,1, · · · ,A:,n) is called the column space of A

rank(A) = dim(span(A:,1, · · · ,A:,n))

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 4 / 26

Page 5: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Matrix Representation of Linear Functions

A linear function (or map or transformation) f : Rn ! Rm can berepresented by a matrix A, A 2 Rm⇥n, such that

f (x) = Ax = y,8x 2 Rn,y 2 Rm

span(A:,1, · · · ,A:,n) is called the column space of A

rank(A) = dim(span(A:,1, · · · ,A:,n))

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 4 / 26

Page 6: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

System of Linear Equations

Given A and y , solve x in Ax = y

What kind of A that makes Ax = y always have a solution?Since Ax = S

i

x

i

A:,i, the column space of A must contain Rm, i.e.,

Rm ✓ span(A:,1, · · · ,A:,n)

Implies n � m

When does Ax = y always have exactly one solution?A has at most m columns; otherwise there is more than one xparametrizing each yImplies n = m and the columns of A are linear independent with eachotherA�1 exists at this time, and x = A�1y

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26

Page 7: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

System of Linear Equations

Given A and y , solve x in Ax = yWhat kind of A that makes Ax = y always have a solution?

Since Ax = Si

x

i

A:,i, the column space of A must contain Rm, i.e.,

Rm ✓ span(A:,1, · · · ,A:,n)

Implies n � m

When does Ax = y always have exactly one solution?A has at most m columns; otherwise there is more than one xparametrizing each yImplies n = m and the columns of A are linear independent with eachotherA�1 exists at this time, and x = A�1y

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26

Page 8: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

System of Linear Equations

Given A and y , solve x in Ax = yWhat kind of A that makes Ax = y always have a solution?

Since Ax = Si

x

i

A:,i, the column space of A must contain Rm, i.e.,

Rm ✓ span(A:,1, · · · ,A:,n)

Implies n � m

When does Ax = y always have exactly one solution?A has at most m columns; otherwise there is more than one xparametrizing each yImplies n = m and the columns of A are linear independent with eachotherA�1 exists at this time, and x = A�1y

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26

Page 9: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

System of Linear Equations

Given A and y , solve x in Ax = yWhat kind of A that makes Ax = y always have a solution?

Since Ax = Si

x

i

A:,i, the column space of A must contain Rm, i.e.,

Rm ✓ span(A:,1, · · · ,A:,n)

Implies n � m

When does Ax = y always have exactly one solution?

A has at most m columns; otherwise there is more than one xparametrizing each yImplies n = m and the columns of A are linear independent with eachotherA�1 exists at this time, and x = A�1y

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26

Page 10: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

System of Linear Equations

Given A and y , solve x in Ax = yWhat kind of A that makes Ax = y always have a solution?

Since Ax = Si

x

i

A:,i, the column space of A must contain Rm, i.e.,

Rm ✓ span(A:,1, · · · ,A:,n)

Implies n � m

When does Ax = y always have exactly one solution?A has at most m columns; otherwise there is more than one xparametrizing each yImplies n = m and the columns of A are linear independent with eachother

A�1 exists at this time, and x = A�1y

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26

Page 11: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

System of Linear Equations

Given A and y , solve x in Ax = yWhat kind of A that makes Ax = y always have a solution?

Since Ax = Si

x

i

A:,i, the column space of A must contain Rm, i.e.,

Rm ✓ span(A:,1, · · · ,A:,n)

Implies n � m

When does Ax = y always have exactly one solution?A has at most m columns; otherwise there is more than one xparametrizing each yImplies n = m and the columns of A are linear independent with eachotherA�1 exists at this time, and x = A�1y

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26

Page 12: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Outline

1Span & Linear Dependence

2Norms

3Eigendecomposition

4Singular Value Decomposition

5Traces and Determinant

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 6 / 26

Page 13: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Vector Norms

A norm of vectors is a function k ·k that maps vectors tonon-negative values satisfying

kxk= 0 ) x = 0kx+ yk kxk+kyk (the triangle inequality)kcxk= |c| ·kxk,8c 2 R

E.g., the L

p norm

kxkp

=

Âi

|xi

|p!

1/p

L

2(Euclidean) norm: kxk= (x>x)1/2

L

1norm: kxk1

= Âi

|xi

|Max norm: kxk• = max

i

|xi

|

x>y = kxkkykcosq , where q is the angle between x and yx and y are orthonormal iff x>y = 0 (orthogonal) and kxk= kyk= 1

(unit vectors)

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 7 / 26

Page 14: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Vector Norms

A norm of vectors is a function k ·k that maps vectors tonon-negative values satisfying

kxk= 0 ) x = 0kx+ yk kxk+kyk (the triangle inequality)kcxk= |c| ·kxk,8c 2 R

E.g., the L

p norm

kxkp

=

Âi

|xi

|p!

1/p

L

2(Euclidean) norm: kxk= (x>x)1/2

L

1norm: kxk1

= Âi

|xi

|Max norm: kxk• = max

i

|xi

|

x>y = kxkkykcosq , where q is the angle between x and yx and y are orthonormal iff x>y = 0 (orthogonal) and kxk= kyk= 1

(unit vectors)

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 7 / 26

Page 15: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Vector Norms

A norm of vectors is a function k ·k that maps vectors tonon-negative values satisfying

kxk= 0 ) x = 0kx+ yk kxk+kyk (the triangle inequality)kcxk= |c| ·kxk,8c 2 R

E.g., the L

p norm

kxkp

=

Âi

|xi

|p!

1/p

L

2(Euclidean) norm: kxk= (x>x)1/2

L

1norm: kxk1

= Âi

|xi

|Max norm: kxk• = max

i

|xi

|

x>y = kxkkykcosq , where q is the angle between x and yx and y are orthonormal iff x>y = 0 (orthogonal) and kxk= kyk= 1

(unit vectors)

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 7 / 26

Page 16: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Matrix Norms

Frobenius normkAk

F

=r

Âi,j

A

2

i,j

Analogous to the L

2 norm of a vector

An orthogonal matrix is a square matrix whose column (resp. rows)are mutually orthonormal, i.e.,

A>A = I = AA>

Implies A�1 = A>

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 8 / 26

Page 17: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Outline

1Span & Linear Dependence

2Norms

3Eigendecomposition

4Singular Value Decomposition

5Traces and Determinant

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 9 / 26

Page 18: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Decomposition

Integers can be decomposed into prime factorsE.g., 12 = 2⇥2⇥3

Helps identify useful properties, e.g., 12 is not divisible by 5

Can we decompose matrices to identify information about theirfunctional properties more easily?

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 10 / 26

Page 19: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Decomposition

Integers can be decomposed into prime factorsE.g., 12 = 2⇥2⇥3

Helps identify useful properties, e.g., 12 is not divisible by 5

Can we decompose matrices to identify information about theirfunctional properties more easily?

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 10 / 26

Page 20: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Eigenvectors and Eigenvalues

An eigenvector of a square matrix A is a non-zero vector v such thatmultiplication by A alters only the scale of v :

Av = lv,

where l 2R is called the eigenvalue corresponding to this eigenvector

If v is an eigenvector, so is any its scaling cv,c 2 R,c 6= 0

cv has the same eigenvalueThus, we usually look for unit eigenvectors

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 11 / 26

Page 21: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Eigenvectors and Eigenvalues

An eigenvector of a square matrix A is a non-zero vector v such thatmultiplication by A alters only the scale of v :

Av = lv,

where l 2R is called the eigenvalue corresponding to this eigenvectorIf v is an eigenvector, so is any its scaling cv,c 2 R,c 6= 0

cv has the same eigenvalueThus, we usually look for unit eigenvectors

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 11 / 26

Page 22: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Eigendecomposition I

Every real symmetric matrix A 2 Rn⇥n can be decomposed into

A = Qdiag(l )Q>

l 2 Rn consists of real-valued eigenvalues (usually sorted in descendingorder)Q = [v(1), · · · ,v(n)] is an orthogonal matrix whose columns arecorresponding eigenvectors

Eigendecomposition may not be uniqueWhen any two or more eigenvectors share the same eigenvalueThen any set of orthogonal vectors lying in their span are alsoeigenvectors with that eigenvalue

What can we tell after decomposition?

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 12 / 26

Page 23: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Eigendecomposition I

Every real symmetric matrix A 2 Rn⇥n can be decomposed into

A = Qdiag(l )Q>

l 2 Rn consists of real-valued eigenvalues (usually sorted in descendingorder)Q = [v(1), · · · ,v(n)] is an orthogonal matrix whose columns arecorresponding eigenvectors

Eigendecomposition may not be uniqueWhen any two or more eigenvectors share the same eigenvalueThen any set of orthogonal vectors lying in their span are alsoeigenvectors with that eigenvalue

What can we tell after decomposition?

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 12 / 26

Page 24: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Eigendecomposition I

Every real symmetric matrix A 2 Rn⇥n can be decomposed into

A = Qdiag(l )Q>

l 2 Rn consists of real-valued eigenvalues (usually sorted in descendingorder)Q = [v(1), · · · ,v(n)] is an orthogonal matrix whose columns arecorresponding eigenvectors

Eigendecomposition may not be uniqueWhen any two or more eigenvectors share the same eigenvalueThen any set of orthogonal vectors lying in their span are alsoeigenvectors with that eigenvalue

What can we tell after decomposition?

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 12 / 26

Page 25: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Eigendecomposition II

Because Q = [v(1), · · · ,v(n)] is an orthogonal matrix, we can think of Aas scaling space by l

i

in direction v(i)

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 13 / 26

Page 26: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Rayleigh’s Quotient

Theorem (Rayleigh’s Quotient)

Given a symmetric matrix A 2 Rn⇥n

, then 8x 2 Rn

,

lmin

x>Axx>x

lmax

,

where lmin

and lmax

are the smallest and largest eigenvalues of A.

x>Pxx>x = l

i

when x is the corresponding eigenvector of li

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 14 / 26

Page 27: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Singularity

Suppose A = Qdiag(l )Q>, then A�1 = Qdiag(l )�1Q>

A is non-singular (invertible) iff none of the eigenvalues is zero

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 15 / 26

Page 28: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Singularity

Suppose A = Qdiag(l )Q>, then A�1 = Qdiag(l )�1Q>

A is non-singular (invertible) iff none of the eigenvalues is zero

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 15 / 26

Page 29: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Positive Definite Matrices I

A is positive semidefinite (denoted as A ⌫ O) iff its eigenvalues areall non-negative

x>Ax � 0 for any xA is positive definite (denoted as A � O) iff its eigenvalues are allpositive

Further ensures that x>Ax = 0 ) x = 0Why these matter?

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 16 / 26

Page 30: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Positive Definite Matrices IIA function f is quadratic iff it can be written asf (x) = 1

2

x>Ax�b>x+ c, where A is symmetric

x>Ax is called the quadratic form

Figure: Graph of a quadratic form when A is a) positive definite; b)negative definite; c) positive semidefinite (singular); d) indefinite matrix.

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 17 / 26

Page 31: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Positive Definite Matrices IIA function f is quadratic iff it can be written asf (x) = 1

2

x>Ax�b>x+ c, where A is symmetricx>Ax is called the quadratic form

Figure: Graph of a quadratic form when A is a) positive definite; b)negative definite; c) positive semidefinite (singular); d) indefinite matrix.

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 17 / 26

Page 32: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Positive Definite Matrices IIA function f is quadratic iff it can be written asf (x) = 1

2

x>Ax�b>x+ c, where A is symmetricx>Ax is called the quadratic form

Figure: Graph of a quadratic form when A is a) positive definite; b)negative definite; c) positive semidefinite (singular); d) indefinite matrix.

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 17 / 26

Page 33: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Outline

1Span & Linear Dependence

2Norms

3Eigendecomposition

4Singular Value Decomposition

5Traces and Determinant

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 18 / 26

Page 34: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Singular Value Decomposition (SVD)

Eigendecomposition requires square matrices. What if A is not square?

Every real matrix A 2 Rm⇥n has a singular value decomposition:

A = UDV>,

where U 2 Rm⇥m, D 2 Rm⇥n, and V 2 Rn⇥n

U and V are orthogonal matrices, and their columns are called the left-

and right-singular vectors respectivelyElements along the diagonal of D are called the singular values

Left-singular vectors of A are eigenvectors of AA>

Right-singular vectors of A are eigenvectors of A>ANon-zero singular values of A are square roots of eigenvalues of AA>

(or A>A)

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 19 / 26

Page 35: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Singular Value Decomposition (SVD)

Eigendecomposition requires square matrices. What if A is not square?Every real matrix A 2 Rm⇥n has a singular value decomposition:

A = UDV>,

where U 2 Rm⇥m, D 2 Rm⇥n, and V 2 Rn⇥n

U and V are orthogonal matrices, and their columns are called the left-

and right-singular vectors respectivelyElements along the diagonal of D are called the singular values

Left-singular vectors of A are eigenvectors of AA>

Right-singular vectors of A are eigenvectors of A>ANon-zero singular values of A are square roots of eigenvalues of AA>

(or A>A)

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 19 / 26

Page 36: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Singular Value Decomposition (SVD)

Eigendecomposition requires square matrices. What if A is not square?Every real matrix A 2 Rm⇥n has a singular value decomposition:

A = UDV>,

where U 2 Rm⇥m, D 2 Rm⇥n, and V 2 Rn⇥n

U and V are orthogonal matrices, and their columns are called the left-

and right-singular vectors respectivelyElements along the diagonal of D are called the singular values

Left-singular vectors of A are eigenvectors of AA>

Right-singular vectors of A are eigenvectors of A>ANon-zero singular values of A are square roots of eigenvalues of AA>

(or A>A)

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 19 / 26

Page 37: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Moore-Penrose Pseudoinverse I

Matrix inversion is not defined for matrices that are not square

Suppose we want to make a left-inverse B 2 Rn⇥m of a matrixA 2 Rm⇥n so that we can solve a linear equation Ax = y byleft-multiplying each side to obtain x = By

If m > n, then it is possible to have no such BIf m < n, then there could be multiple B’s

By letting B = A† the Moore-Penrose pseudoinverse, we can makeheadway in these cases:

When m = n and A�1 exists, A† degenerates to A�1

When m > n, A† returns the x for which Ax is closest to y in terms ofEuclidean norm kAx� ykWhen m < n, A† returns the solution x = A†y with minimal Euclideannorm kxk among all possible solutions

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26

Page 38: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Moore-Penrose Pseudoinverse I

Matrix inversion is not defined for matrices that are not squareSuppose we want to make a left-inverse B 2 Rn⇥m of a matrixA 2 Rm⇥n so that we can solve a linear equation Ax = y byleft-multiplying each side to obtain x = By

If m > n, then it is possible to have no such BIf m < n, then there could be multiple B’s

By letting B = A† the Moore-Penrose pseudoinverse, we can makeheadway in these cases:

When m = n and A�1 exists, A† degenerates to A�1

When m > n, A† returns the x for which Ax is closest to y in terms ofEuclidean norm kAx� ykWhen m < n, A† returns the solution x = A†y with minimal Euclideannorm kxk among all possible solutions

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26

Page 39: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Moore-Penrose Pseudoinverse I

Matrix inversion is not defined for matrices that are not squareSuppose we want to make a left-inverse B 2 Rn⇥m of a matrixA 2 Rm⇥n so that we can solve a linear equation Ax = y byleft-multiplying each side to obtain x = By

If m > n, then it is possible to have no such BIf m < n, then there could be multiple B’s

By letting B = A† the Moore-Penrose pseudoinverse, we can makeheadway in these cases:

When m = n and A�1 exists, A† degenerates to A�1

When m > n, A† returns the x for which Ax is closest to y in terms ofEuclidean norm kAx� ykWhen m < n, A† returns the solution x = A†y with minimal Euclideannorm kxk among all possible solutions

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26

Page 40: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Moore-Penrose Pseudoinverse I

Matrix inversion is not defined for matrices that are not squareSuppose we want to make a left-inverse B 2 Rn⇥m of a matrixA 2 Rm⇥n so that we can solve a linear equation Ax = y byleft-multiplying each side to obtain x = By

If m > n, then it is possible to have no such BIf m < n, then there could be multiple B’s

By letting B = A† the Moore-Penrose pseudoinverse, we can makeheadway in these cases:

When m = n and A�1 exists, A† degenerates to A�1

When m > n, A† returns the x for which Ax is closest to y in terms ofEuclidean norm kAx� ykWhen m < n, A† returns the solution x = A†y with minimal Euclideannorm kxk among all possible solutions

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26

Page 41: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Moore-Penrose Pseudoinverse I

Matrix inversion is not defined for matrices that are not squareSuppose we want to make a left-inverse B 2 Rn⇥m of a matrixA 2 Rm⇥n so that we can solve a linear equation Ax = y byleft-multiplying each side to obtain x = By

If m > n, then it is possible to have no such BIf m < n, then there could be multiple B’s

By letting B = A† the Moore-Penrose pseudoinverse, we can makeheadway in these cases:

When m = n and A�1 exists, A† degenerates to A�1

When m > n, A† returns the x for which Ax is closest to y in terms ofEuclidean norm kAx� yk

When m < n, A† returns the solution x = A†y with minimal Euclideannorm kxk among all possible solutions

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26

Page 42: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Moore-Penrose Pseudoinverse I

Matrix inversion is not defined for matrices that are not squareSuppose we want to make a left-inverse B 2 Rn⇥m of a matrixA 2 Rm⇥n so that we can solve a linear equation Ax = y byleft-multiplying each side to obtain x = By

If m > n, then it is possible to have no such BIf m < n, then there could be multiple B’s

By letting B = A† the Moore-Penrose pseudoinverse, we can makeheadway in these cases:

When m = n and A�1 exists, A† degenerates to A�1

When m > n, A† returns the x for which Ax is closest to y in terms ofEuclidean norm kAx� ykWhen m < n, A† returns the solution x = A†y with minimal Euclideannorm kxk among all possible solutions

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26

Page 43: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Moore-Penrose Pseudoinverse II

The Moore-Penrose pseudoinverse is defined as:

A† = lim

a&0

(A>A+aIn

)�1A>

A†A = I

In practice, it is computed by A† = VD†U>, where UDV> = AD† 2 Rn⇥m is obtained by taking the inverses of its non-zero elementsthen taking the transpose

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 21 / 26

Page 44: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Moore-Penrose Pseudoinverse II

The Moore-Penrose pseudoinverse is defined as:

A† = lim

a&0

(A>A+aIn

)�1A>

A†A = I

In practice, it is computed by A† = VD†U>, where UDV> = AD† 2 Rn⇥m is obtained by taking the inverses of its non-zero elementsthen taking the transpose

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 21 / 26

Page 45: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Outline

1Span & Linear Dependence

2Norms

3Eigendecomposition

4Singular Value Decomposition

5Traces and Determinant

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 22 / 26

Page 46: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Traces

tr(A) = Âi

Ai,i

tr(A) = tr(A>)

tr(aA+bB) = atr(A)+btr(B)kAk2

F

= tr(AA>) = tr(A>A)tr(ABC) = tr(BCA) = tr(CAB)

Holds even if the products have different shapes

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 23 / 26

Page 47: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Traces

tr(A) = Âi

Ai,i

tr(A) = tr(A>)

tr(aA+bB) = atr(A)+btr(B)kAk2

F

= tr(AA>) = tr(A>A)tr(ABC) = tr(BCA) = tr(CAB)

Holds even if the products have different shapes

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 23 / 26

Page 48: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Determinant I

Determinant det(·) is a function that maps a square matrix A 2 Rn⇥n

to a real value:

det(A) = Âi

(�1)i+1

A

1,i det(A�1,�i

),

where A�1,�i

is the (n�1)⇥ (n�1) matrix obtained by deleting thei-th row and j-th column

det(A>) = det(A)det(A�1) = 1/det(A)

det(AB) = det(A)det(B)det(A) = ’

i

li

What does it mean? det(A) can be also regarded as the signed area

of the image of the “unit square”

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 24 / 26

Page 49: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Determinant I

Determinant det(·) is a function that maps a square matrix A 2 Rn⇥n

to a real value:

det(A) = Âi

(�1)i+1

A

1,i det(A�1,�i

),

where A�1,�i

is the (n�1)⇥ (n�1) matrix obtained by deleting thei-th row and j-th columndet(A>) = det(A)det(A�1) = 1/det(A)

det(AB) = det(A)det(B)

det(A) = ’i

li

What does it mean? det(A) can be also regarded as the signed area

of the image of the “unit square”

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 24 / 26

Page 50: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Determinant I

Determinant det(·) is a function that maps a square matrix A 2 Rn⇥n

to a real value:

det(A) = Âi

(�1)i+1

A

1,i det(A�1,�i

),

where A�1,�i

is the (n�1)⇥ (n�1) matrix obtained by deleting thei-th row and j-th columndet(A>) = det(A)det(A�1) = 1/det(A)

det(AB) = det(A)det(B)det(A) = ’

i

li

What does it mean?

det(A) can be also regarded as the signed area

of the image of the “unit square”

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 24 / 26

Page 51: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Determinant I

Determinant det(·) is a function that maps a square matrix A 2 Rn⇥n

to a real value:

det(A) = Âi

(�1)i+1

A

1,i det(A�1,�i

),

where A�1,�i

is the (n�1)⇥ (n�1) matrix obtained by deleting thei-th row and j-th columndet(A>) = det(A)det(A�1) = 1/det(A)

det(AB) = det(A)det(B)det(A) = ’

i

li

What does it mean? det(A) can be also regarded as the signed area

of the image of the “unit square”

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 24 / 26

Page 52: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Determinant IILet A =

a b

c d

�, we have [1,0]A = [a,b], [0,1]A = [c,d], and

det(A) = ad�bc

Figure: The area of the parallelogram is the absolute value of the determinant ofthe matrix formed by the images of the standard basis vectors representing theparallelogram’s sides.

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 25 / 26

Page 53: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Determinant III

The absolute value of the determinant can be thought of as a measureof how much multiplication by the matrix expands or contracts space

If det(A) = 0, then space is contracted completely along at least onedimension

A is invertible iff det(A) 6= 0

If det(A) = 1, then the transformation is volume-preserving

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 26 / 26

Page 54: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Determinant III

The absolute value of the determinant can be thought of as a measureof how much multiplication by the matrix expands or contracts spaceIf det(A) = 0, then space is contracted completely along at least onedimension

A is invertible iff det(A) 6= 0

If det(A) = 1, then the transformation is volume-preserving

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 26 / 26

Page 55: Linear Algebra - GitHub Pages · Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung

Determinant III

The absolute value of the determinant can be thought of as a measureof how much multiplication by the matrix expands or contracts spaceIf det(A) = 0, then space is contracted completely along at least onedimension

A is invertible iff det(A) 6= 0

If det(A) = 1, then the transformation is volume-preserving

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 26 / 26