algebra 1011

Elements of Mathematics: an embarrasignly simple (but

practical) introduction to algebra

Jordi Villa i Freixa ([email protected]), Pau Rue

November 23, 2011

Contents

1 Introduction 3

2 Sets 3

3 Groups and fields 8

4 Matrices 11

4.1 Basic operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.1.1 Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.1.2 Transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.1.3 Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1.4 Determinant of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.1.5 Rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 Orthogonal/orthonormal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5 Systems of linear equations 18

5.1 Elementary matrices and inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6 Vector spaces 22

6.1 Basis change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1

MAT: 2011-31035-T1 MSc Bioinformatics for Health Sciences

6.2 The vector space L(V,W ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.3 Rangespace and nullspace (Kernel) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.4 Composition and inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.5 Linear transforms and matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.6 Composition and matrix product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

7 Projection 36

7.1 Orthogonal Projection Into a Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

7.2 Gram-Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.3 Projection Into a Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

8 Diagonalization 46

8.1 Diagonalizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

8.2 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

9 Singular value decomposition (SVD) and principal component analysis (PCA) 54

9.1 Spectral decomposition of a square matrix . . . . . . . . . . . . . . . . . . . . . . . . 54

9.2 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

9.3 Properties of a data matrix -first and second moments . . . . . . . . . . . . . . . . . 55

9.4 Principal component analysis (PCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

9.5 PCA by SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

9.6 PCA by SVD in Octave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

9.7 More samples than variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

9.8 Number of Principal Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

9.9 Similar Methods for Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . 59

Algebra 2


Summary. Playing around with matrices and their properties. Some examples of resolution ofsystems of linear equations.

1 Introduction

This is a non-exhaustive review of matrices and their properties. The practical part can be performedwith the help of octave (http://www.octave.org). There are versions of the program for cygwinand linux.

Some on-line additional sources of information can be found at:

http://joshua.smcvt.edu/linalg.html

http://www.math.unl.edu/~tshores/linalgtext.html

http://archives.math.utk.edu/tutorials.html

http://www.cs.unr.edu/~bebis/MathMethods/

http://en.wikibooks.org/wiki/Linear_Algebra

http://nptel.iitm.ac.in/courses/Webcourse-contents/IIT-KANPUR/mathematics-2/book.html

For help on octave and linear algebra:

http://math.iu-bremen.de/oliver/teaching/iub/resources/

octave/octave-intro/octave-intro.html

http://www2.math.uic.edu/~hanson/Octave/OctaveLinearAlgebra.html

Check also [1, 2, 3].

2 Sets

Any collection of objects, for example the points of a given segment, the collection of all integernumbers between 0 and 10, the students in a classroom, etc. is called a set. The objects insidethe set (the points, the numbers and the students) are called elements of the set. In algebra it iscommon to represent sets using uppercase letters and elements using lowercase letters. The elementsof a set are specified between curly brackets. For example A = {a, b, c, d} represents a set formed of4 elements.

A set can be specified either in an extensive way as in the case of A = {a, b, c, d} or in an intesiveway, where there is no need to specify all the elements belonging to it but only the properties theysatisfy. As an example, the set, A, of all integer numbers between 0 and 10 can be specified asA = {x|x ∈ Z, 0 ≤ x ≤ 10}1.

There is a huge amount of literature descibing formally what a set is. We will just stick to theidea that a set is a collection of elements none of them equal to another. The following is a list ofbasic properties and definitons concerning sets:

1x ∈ A is the mathematical way of representing that the element x belongs to the set A

Algebra 3


• A set A is said to be included within a set B (or that A is a subset of the set B, or that Bcontains A) if and only if all the elements of A belong to B. In this case we will write A ⊂ B.So in a strictly mathematical way we would write

A ⊂ B if and only if ∀x ∈ A⇒ x ∈ B2

• Hence, two sets, A and B, are said to be equal, A = B, if and only if both conditions A ⊂ Band B ⊂ A are fulfilled.

ExampleThe usual numeric sets N = 1, 2, 3, · · · (the natural numbers), Z = 0, 1,−1, 2,−2, · · · (the

integer numbers), Q (the rational numbers), R (the real numbers) and C (the complex numbers)are related in the following way:

N ⊂ Z ⊂ Q ⊂ R ⊂ C.

• There is only one unique set that contains no elements. It is called the empty set and it isdenoted by ∅.

• In addition, the universe of a given problem is the reference set, U , that contains all the setsused in that particular problem.

• Given a universe U , and a subset A, the complement of A in U , A, is the set of all elementsin U that do not belong to A. Formally,

A = {x ∈ U |x /∈ A}.

• Given a universe U and two sets A and B we can define de following operations:

1. The union of A and B is the set having all the elements from both sets A and B and noother element.

A ∪B = {x ∈ U |x ∈ A or x ∈ B}.Notice that this operation is commutative: A∪B = B∪A and associative: A∪ (B∪C) =(A ∪B) ∪ C.

ExampleLet A = {1, 2, 3, 4, 5} and B = {2, 3, 7, 8}, then

A ∪B = {1, 2, 3, 4, 5, 7, 8}.

2. The intersection of A and B is set the having all the elements common to A and B andno other element.

A ∩B = {x ∈ U |x ∈ A and x ∈ B}.This operation is also commutative: A ∩ B = B ∩ A and associative: A ∩ (B ∩ C) =(A ∩B) ∩ C.

2∀ is a mathematical symbol meaning for all, as in ∀x ∈ A(for all element x in the set A)

Algebra 4


ExampleLet A = {1, 2, a, b} and B = {2, 3, a, c}, then

A ∩B = {2, a}.

3. The set difference of A and B is the set having all the elements in A that are not foundin B and no other element.

A \B = {x ∈ U |x ∈ A and x /∈ B}.

This operation is neither commutative nor associative. Notice also that we can writeA \B = A ∩ B

4. Given two elements a and b, we call an ordered pair the collection of these two elementsin a given order. We denote the ordered pair with a being the first coordinate and bthe second coordinate as in (a, b). Notice that with this definiton order matters (i.e.(a, b) 6= (b, a)). The cartesian product of A and B, A × B, is then defined as the set ofall ordered pairs of elements where the element in the first coordinate belongs to A, andthe element in the second coordinate belongs to B.

ExampleLet A = {1, 2, 3} and B = {a, b}, then

A×B = {(1, a), (2, a), (3, a), (1, b), (2, b), (3, b)}

andB2 = B ×B = {(a, a), (a, b), (b, a), (b, b)}.

In the same manner, starting from the set of real number R, also known as the real line,we can generate the set known as the real plane

R2 = R× R = {(x, y)|x, y ∈ R}

and the real spaceR3 = R× R× R = {(x, y, z)|x, y, z ∈ R}

• A binary operation on a set is a calculation involving two operands (elements of the set) andits result is an element of the set. Let ? be an operation in a set A. We write:

? : A×A → A(a, b) 7→ c = a ? b

which means that given two elements a, b ∈ A, the result of operating a and b is an elementc = a ? b which also belongs to A.

– The property that the result of operating two elements of the set A is also an element ofthe set A is called the closure property.

Example

Algebra 5


∗ The normal sum (+) of natural numbers (N) and real numbers (R) is an operationthat fulfills the closure property.

∗ The normal subtraction (−) of natural numbers is an operation that does not fulfillthe closure property (2− 4 = −2 /∈ N) while in the case of real numbers it is fulfilled.

∗ The product of rational numbers (Q) is an operation that fulfills the closure property.

– Another property an operation can have is associativity. An operation ? in a set A is saidto be associative if for all elements a, b and c in A it holds:

a ? (b ? c) = (a ? b) ? c.

ExampleUsual sum and product in the natural, integer, rational and real numbers are associative

operations. On the other hand, subtraction and division are not. Take as examples thesecases: 3− (2− 1) 6= (3− 2)− 1 and 3/(5/2) 6= (3/5)/2

– Notice that as the definition of operation is based on the cartesian product (A × A)the order of the operands does matter in principle. An operation where the order ofthe operands does not matter is said to be commutative. Formally, an operation ? iscommutative if for all a, b ∈ A it holds a ? b = b ? a.

ExampleAgain, normal addition and multiplication in the natural, integer, rational and real

numbers are commutative operations while division and subtraction are not.

Another operation that is non-commutative is the product of matrices. For instance, if

M =

(1 23 4

)and N =

(3 40 2

)then M ·N =

(3 89 20

)while N ·M =

(15 226 8

)

– We say that an operation ? in A has an identity (also called neutral) element, e, if thereexists an element e ∈ A such that for all elements a ∈ A it holds

a ? e = e ? a = a

.

ExampleThe normal addition and multiplication in the integer, rational and real numbers have

identity elements 0 and 1 respectively. Notice that 0 /∈ N

– Given an operation ? in A and a ∈ A, let e be the identity element of ? in A. An elementb ∈ A is said to be an inverse of a if a ? b = b ? a = e. It can be easily shown that if thisexists it is unique (no other element can be the inverse of a). We will write the inverse ofa as −a or a−1.

Algebra 6


ExampleIn the normal addition in Z, Q and R all numbers have an inverse. This is not the case

for the natural numbers. For the multiplication case in Q and R all numbers but 0 haveinverse.

– Given two operations ? and ◦ in A, we say that ? is distributive over ◦ if a ? (b ◦ c) =(a ? b) ◦ (a ? c) and (b ◦ c) ? a = (b ? a) ◦ (c ? a) for all a, b, c ∈ A

ExampleNormal multiplication in the natural, integer, rational and real numbers is distributive

over the normal addition.

Algebra 7


3 Groups and fields

In this section we will introduce the notions of group and field. Both concepts are fundamentalin all fields of mathematics. A group is nothing other than a set of elements together with anoperation that combines any two of its elements to form a third element plus a few requirements onthe operation behavior which naturally lead to the concept of subtraction. Many basic mathematicalstructures are groups (say Z, Q and R with the usual addition, for instance).

On the other hand, a field is a set with two operations designated as addition and multiplicationwith some properties that lead naturally to the operations of subtraction and division.

Definition 1. A group is a set, G, together with an operation ?

? : A×A → A(a, b) 7→ c = a ? b

which satisfies the following axioms:

1. a ? b ∈ G ∀a, b ∈ G (closure)

2. a ? (b ? c) = (a ? b) ? c ∈ G ∀a, b, c ∈ G (associativity)

3. ∃e ∈ G such that a ? e = e ? a = a ∀a, b ∈ G (identity element)

4. ∀a ∈ G ∃b ∈ G such that a ? b = e and b ? a = e (inverse element).

. We denote the group as (G, ?).

As a remark, the associativity property is the one that allows us to get rid of the parentheseswhen summing of multiplying several numbers. That is, we usually write a · b · c · d instead ofa · (b · (c · d)) or (a · b) · (c · d) or ((a · b) · c) · d as the multiplication is defined as a binary operation.It is correct to write it without parentheses beacuse the multiplication is associative.

Example

• (Z,+) is a group.

• (N,+) is not a group, as there is no identity element for the sum.

• (Z,−) is not a group, as the associativity property is not fulfilled.

• (Z, ·) is not a group, as there are no inverse elements for the elements 2, 3, 4, etc. (i.e. thereis no integer x such that 2 · x = 1).

• (Q,+) and (R,+) are also groups.

• (Q, ·) and (R, ·) are not groups. The only property that is violated is the inverse element. Theelement 0 does not have an inverse (i.e. there is no number x such that x ·0 = 1). If we remove0 from these sets it is easy to see that (Q \ {0}, ·) and (R \ {0}, ·) are groups.

Algebra 8


• The set of polynomials of degree n with coefficients in Z(anx

n + an−1xn−1 + · · ·+ a1x+ a0|an, · · · , a0 ∈ Z) are a group with the addition of

polynomials. The identity element is the constant polynomial 0. Given a polynomialp(x) = anx

n + an−1xn−1 + · · ·+ a1x+ a0 its inverse is q(x) = −anxn + (−an−1)xn−1 + · · ·+

(−a1)x+ (−a0).

Notice that the commutativity property is not required in the definition of a group. There aregroups that are not commutative.

Definition 2. A field is a set, F , with two operations, + and ·, such that

1. a+ b ∈ F ∀a, b ∈ F (closure for +).

2. a+ (b+ c) = (a+ b) + c (associativity for +).

3. ∃e+ ∈ F such that a+ e+ = e+ + a = a ∀a ∈ F (neutral element for +). We will denote thiselement as 0.

4. ∀a ∈ F ∃b such that a+ b = b+ a = 0 (inverse element for +). We will denote this element as−a.

5. a+ b = b+ a ∀a, b ∈ F (commutativity for +).

6. a · b ∈ F ∀a, b ∈ F (closure for ·).

7. a · (b · c) = (a · b) · c (associativity for ·).

8. ∃e· ∈ F such that a · e· = e· ·a = a ∀a ∈ F (neutral element for ·). We will denote this elementas 1.

9. ∀a ∈ F \ {0} ∃b such that a · b = b · a = 1 (inverse element for · for all elements but 0). Wewill denote this element as a−1.

10. a · b = b · a ∀a, b ∈ F (commutativity for ·).

11. a · (b+ c) = a · b+ a · c ∀a, b, c ∈ F (distributive property of · with respect to +).

12. 1 6= 0 (nontriviality, the neutral element for + and the neutral element for · must be different).

We will write (F,+, ·).

From this definition one can easily see that (F,+) and (F \{0}, ·) are commutative groups. Recallthat Q and R satisfy this. In fact, (Q,+, ·) and (R,+, ·) are fields but not (N,+, ·) and (Z,+, ·).

The complex numbers C = {a + ib|a, b ∈ R and i2 = −1} with the complex addition andmultiplication:

• (a+ ib) + (c+ id) = (a+ c) + i(b+ d)

• (a+ ib) · (c+ id) = (ac− bd) + i(bc+ ad)

Algebra 9


are a field.

Not all fields have infinite elements. This might seem counterintuitive if one takes into accountthat addition and multiplication operations are closed. The idea behind the finite groups is thatthese operations are adapted in such a way that the closure property is fulfilled together with allthe other required properties. For instance, let’s consider the set Z2 = {0, 1}. And let’s define the

operations+ 0 10 0 11 1 0

and· 0 10 0 01 0 1

. Then (Z2,+, ·) is a field.

Algebra 10


4 Matrices

Definition 3. Let F be a field (e.g. Q, R or C). A matrix A with coefficients in F of order n×m(n,m ∈ N) is a collection of n×m ordered elements of F.

A =

a11 a12 · · · a1ma21 a22 · · · a2m...

.... . .

...an1 an2 · · · anm

= {aij}1≤i≤n,1≤j≤m

The first index refers to the row number and the second to the column number. A 1× n matrixis called a row vector whereas an m × 1 matrix is called column vector. In the case of n = m thematrix is said to be square of order n.

Example

• (3 4 2) is a row vector while

351

is a column vector.

• The matrix

1.3 465 57 1

7

is a 3× 2 matrix in Q and the matrix

(π2 e2 70 3π

51e

)is a 2× 3 matrix

with coefficients in R

We refer to the set of all n×m matrices with coefficients in F by Mn×m(F ).

Definition 4. • The main diagonal of a matrix A = (aij) is the set of coefficients aij such thati = j.

• A zero matrix, 0 is a matrix all of whose elements are equal to zero.

• The unit matrix or identity matrix of order n, In is the square matrix of order n in whichall the coefficients in the main diagonal are equal to one and all other elements are 0. Theidentity matrix of order n is written as In.

In =

1 0 · · · 00 1 · · · 0...

.... . .

...0 0 · · · 1

Definition 5. For any square matrix, the trace is evaluated by:

tr(A) = a11 + a22 + · · ·+ ann

with properties:tr(AT) = tr(A)

Algebra 11


tr(A±B) = tr(A)± tr(B)

tr(AB) = tr(BA)

tr(AB) 6= tr(A)tr(B)

4.1 Basic operations

4.1.1 Sum

Definition 6. Let A,B ∈ Mn×m(F ). The sum of matrices A and B is defined as the matrixC ∈Mn×m(F ) such that

cij = aij + bij ∀1 ≤ i ≤ n, 1 ≤ j ≤ m.

We write then C = A + B.

The sum of matrices is a commutative operation, hence for any two matrices A and B we haveA + B = B + A.

Example 2 1 54 2 10 3 2

+

2 5 01 1 32 0 0

=

4 6 55 3 42 3 2

4.1.2 Transposition

Definition 7. Let A = (aij) ∈ Mn×m(F ). The transpose of A, AT ∈ Mm×n(F ) is the matrix oforder m× n

AT = (bij)1≤i≤n;1≤j≤m where bij = aji

Example

(4 2 10 3 2

)T

=

4 02 31 2

Definition 8. Let A ∈ Mn(F ) be a square matrix. A is said to be symmetric if A = zAT andantisymmetric if A = −AT. Antisymmetric matrices have 0 diagonals (aii = −aii ⇒ aii = 0 ∀i)

Algebra 12


4.1.3 Product

Definition 9. Let A = (aij) ∈ Mn×p(F ) and B = (bij) ∈ Mp×m(F ). The product of A and B isthe matrix C = A ·B ∈Mn×m(F ), where

cij =

p∑k=1

aikbkj .

Notice that the number of columns of the first (left) factor must be the same as the number ofrows of the second (right) factor. The resulting matrix has the same number of rows as the left factorand the same number of columns as the right factor. Notice also that, even though square matricescan be multiplied in either order (swapping the matrices order) the product is not commutative.

Example

Let A =

(4 2 10 3 2

)and B =

(7 30 5

). Then

• A ·B is not defined, as the number of columns of A and the number of rows of B is not thesame.

• On the other hand, B ·A =

(28 23 130 15 10

)

• B ·B =

(49 360 25

)and AT ·B =

28 1214 217 13

• AT ·A =

16 8 48 13 84 8 5

and A ·AT =

(21 88 13

)

octave:1> A=[1,2;4,0;3,-2;5,1]

A =

1 2

4 0

3 -2

5 1

octave:2> B=[1,2,0;5,-1,3]

B =

1 2 0

5 -1 3

octave:3> C=A*B

C =

Algebra 13


11 0 6

4 8 0

-7 8 -6

10 9 3

Definition 10. Let A ∈ Mn(F ) be a square matrix. A square matrix B is the inverse of A ifA ·B = In and B ·A = In. A matrix A is called invertible if it has an inverse.

If the inverse of a matrix exists, it is unique. Hence, the inverse of a matrix A can be written asA−1.

Example (1 23 4

)−1=

(−2 132 − 1

2

)

If A is not square, then we define the pseudo-inverse A+ as:

A+ = (ATA)−1AT

and it can easily been shown thatA+A = I

4.1.4 Determinant of a matrix

Definition 11. The determinant is an operation from the set of all square matrices of order nwith coefficients in a field F to the field F . That is, for any matrix A, det(A) is an element of F .The determinant can be defined in many ways. The definition which leads to the simplest way ofcomputing determinants is by the Laplace expansion (cofactor expansion). We write det(A) = |A|

Invertible matrices are precisely those matrices with a nonzero determinant.

In the case of square matrices of order two, the determinant can be computed in the followingway

det

(a bc d

)=

∣∣∣∣a bc d

∣∣∣∣ = ad− bc

And in the case of order 3 matrices:

det

a11 a12 a13a21 a22 a23a31 a32 a33

=

∣∣∣∣∣∣a11 a12 a13a21 a22 a23a31 a32 a33

∣∣∣∣∣∣ = a11a22a33+a12a23a31+a13a21a32−a13a22a31−a12a21a33−a11a23a32.

Let A ∈Mn(F ), the Laplace expansion of det(A) is a way to express it as a sum of n determinantsof (n−1)×(n−1) sub-matrices of A. Let’s define the i, j–minor of A, Aij ∈Mn−1(F ) as the matrix

Algebra 14


obtained by removing the ith row and jth column of A. Fix any index j, then the determinant ofA can be defined recursively as

det(A) =

n∑i=1

(−1)i+jaij det(Aij).

The same result is also true by fixing a row and summing over the columns. Notice that thereare 2n different possible expansions, the result does not depend on the chosen column/row.

The following is a list of the most important properties of the determinants:

• If a matrix has two columns or two rows equal (or proportional) then the determinant is zero.

• If A is a matrix such that det(A) = 0 then there is a linear combination of rows (columns) ofA equal to zero.

• If one column/rows of a matrix is a linear combination of other columns/rows then its deter-minant is zero.

• det(AT) = det(A).

• det(A ·B) = det(A) · det(B)

Proposition 1. Let A ∈Mn. A is invertible if and only if det(A) 6= 0.

There are different ways of computing the inverse of a matrix.

4.1.5 Rank of a matrix

Definition 12. Let A ∈ Mn×m. A minor of order r of A is a square submatrix of A of order r.That is, a submatrix of A obtained by removing n− r rows and m− r columns.

Definition 13. The rank of a matrix A ∈Mn×m(F ) is r if any minor of order > r has a determinantof zero and there exists a minor of order r with a non–zero determinant.

Example

Let A =

3 1 52 1 45 0 5

. Then, as det(A) = 0 but det

(3 12 1

)= 1 6= 0 the rank of A is rk(A) = 2.

octave:4> D=[1,2,2,3;4,0,8,-1;3,-2,6,0;5,1,10,-8]

D =

1 2 2 3

4 0 8 -1

3 -2 6 0

5 1 10 -8

Algebra 15


octave:5> det(D)

ans = 0

octave:9> E=[1,2,3;4,0,-1;3,-2,0]

E =

1 2 3

4 0 -1

3 -2 0

octave:10> det(E)

ans = -32

In the above example, matrix D has rank 3. It is also the maximum number of linearly indepen-dent columns or rows of D.

There several properties of matrices based on the rank:

• rank(Am×n) ≤ minm,n.

• rank(An×n) = n if and only if A is nonsingular (invertible).

• rank(An×n) = n if and only if det(A) 6= 0.

• rank(An×n) < n if and only if A is singular.

4.2 Orthogonal/orthonormal matrices

Consider the matrix A:

A =

a1,1 a1,2 . . . a1,ma2,1 a2,2 . . . a2,m

...an,1 an,2 . . . an,m

Let’s take the vectors formed by the rows (or columns) of matrix A:

uT1 = (a11, a12, . . . , a1n)

uT2 = (a21, a22, . . . , a2n)

...

uTm = (am1, am2, . . . , amn)

Let us consider the properties:

1. ukuk = 1 or ‖uk‖ = 1, for every k

2. ujuk = 0, for every j 6= k

Algebra 16


A is orthonormal if both conditions are satisfied. A is ortogonal if only condition 2 is satisfied.

If A is orthonormal, then:AAT = ATA = I

or what is the same,A−1 = AT

‖Av‖ = ‖v‖

Algebra 17


5 Systems of linear equations

ExampleLet’s consider the reaction of glucose (C6H12O6) oxidation in which carbon dioxide and water

are obtained. Suppose we don’t know the stoichiometric coefficients of the reaction, which we willdesignate by the unknowns x, y, z and t as shown in:

xC6H12O6 + yO2 −→ zCO2 + tH2O

The number of atoms of each element must be the same on each side of the reaction, hence we canestablish the following relations:

6x = z

12x = 2t

6x+ 2y = 2z + t

We will see that, this system is compatible indeterminate, meaning that it accepts infinite solutions.Setting x = 1 we get only one solution which is (x, y, z, t) = (1, 6, 6, 6)

Definition 14. A system of linear equations is a collection of linear equations

a11x1 + a12x2 + · · · a1mxm = b1

a21x1 + a22x2 + · · · a2mxm = b2

...

an1x1 + an2x2 + · · · anmxm = bn

where the numbers aij ∈ R are the coefficients and bi are the independent or constant term. Thissystem can be represented in the matrix form

Ax = b,

where A = (aij)1≤i≤m;1≤j≤n is the matrix of the sytem, x = (x1, · · · , xm)T

is the variable vector

and b = (b1, · · · , bn)T

is the vector of independent terms. A column vector s = (s1, · · · , sm)T ∈ Rn

is a solution of the system if substituting x by s gives a true statement

As = b.

That is, s = (s1, · · · , sm)T

is a solution of all the equations in the system.

• Not all systems have a unique solution (e.g. 2x+ 4y = 0 accepts infinite solutions).

• There are systems with no solutions (e.g. −2x+ 4y = 1, x− 2y = 3 has no solutions).

Therefore, we need a criterium to decide whether a given system of linear equations has a solutionor not. One of the most used criteria is the Rouche–Frobenius criterium which is based on the rankof the systems matrix A and the augmented matrix A|b (which is A with the column vector bappended). It says:

Algebra 18


• If rk(A) = rk(A|b), the system is said to be compatible and it accepts solutions.

– If rk(A) = m (the number of variables), the system is said to be determinate and it hasone unique solution.

– Otherwise, the system is said to be indeterminate and it has an infinite number of solu-tions.

• If rk(A) 6= rk(A|b), the system is said to be incompatible and there are no solutions to it.

Once we know whether a system has solutions or not, we have to solve it (in the former case).Although there are many methods for solving systems of linear equations using computers, thestandard method of resolution by hand is the method of Gauss or Gaussian elimination method.This is based on replacing equations in the system by linear combinations of other equations insuch a way that the obtained system is equivalent to the original one in the sense that they sharethe same solutions but the new system is upper diagonal (and hence can be trivially solved). Thismethod is based on the following result:

Theorem 1. If a system of linear equations is changed to another by one of these transformations:

1. an equation is swapped with another equation

2. an equation has both sides multiplied by a nonzero constant

3. an equation is replaced by the sum of itself and a multiple of another

then the two systems have the same set of solutions

ExampleGiven the system −3 2 −6

5 7 −51 4 −2

·xyz

=

668

We know that the system is compatible determinate and hence it only has one unique solution.

A|b =

−3 2 −6 65 7 −5 61 4 −2 8

ρ1+3ρ3−−−−−→

0 14 −12 305 7 −5 61 4 −2 8

12ρ1−−→

0 7 −6 155 7 −5 61 4 −2 8

−5ρ3+ρ2−−−−−−→

0 7 −6 150 −13 5 −341 4 −2 8

2ρ1+ρ2−−−−−→

0 7 −6 150 1 −7 −41 4 −2 8

ρ1−7ρ2−−−−−→

0 0 43 430 1 −7 −41 4 −2 8

143ρ1−−−→

0 0 1 10 1 −7 −41 4 −2 8

→ 1 4 −2 8

0 1 −7 −40 0 1 1

Hence z = 1, y − 7 = −4⇒ y = 3 and x+ 12− 2 = 8⇒ x = −2

Summarizing:

Algebra 19


If A is invertible, AX = b has exactly one solution:

x = A−1b

The following statements are equivalent:

1. A is invertible

2. Ax = 0 has only the trivial solution

3. det(A) 6= 0

4. b is in the column space of A.a11a21...

am1

x1 +

a12a22...

am2

x1 + · · ·

a1na2n

...amn

x1 =

b1b2...bn

5. rank(A|b) = rank(A) and rank(A) = n

6. The column/row vectors of A are linearly independent

7. The column/row vectors of A span Rn

The system has no solution if rank(A|b) > rank(A). The system has infinitely many solutions ifrank(A|b) = rank(A) < n.

5.1 Elementary matrices and inverse

The same method Gauss-Jordan can be applied to obtain the inverse matrix. Let us define first theabove transformation steps in a more precise way. Indeed, there are just three types of transforma-tions, and all can be associated to the product for a so-caled elementary matrix:

1. Switching two rows in the matrix. For example, switching rows 2 and 3 in a given 3×m matrixA is equivalent to do 1 0 0

0 0 10 1 0

A = E23A

2. Multiplying a row by a given value. For example, c 0 00 1 00 0 1

A = E1(c)A

3. Summing up one row to the product of another by a number. This is: 1 0 00 1 c0 0 1

A = E23(c)A

Algebra 20


It is easy to see that we can build an inverse matrix making use of elementary transformations.Let A be an invertible n×n matrix. Suppose that a sequence of elementary row-operations reducesA to the identity matrix. Then the same sequence of elementary row-operations when applied tothe identity matrix yields A−1. To see how this is the case, let E1, E2, . . . , Ek be a sequence ofelementary row operations such that E1E2 · · ·EkA = In. Then E1E2 · · ·EkIn = A−1 which, inturn, implies A−1 = E1E2 · · ·Ek.

Algebra 21


6 Vector spaces

Vector spaces are the mathematical structures most oftenly found in Bioinformatics. The realnumbers R, real plane R2 and real space R are the most common vector spaces. The idea behinda vector space is that its elements, the vectors, can be added between them and also scaled by realnumbers.

Definition 15. A vector space over R consists of a set V along with two operations + and · suchthat:

1. If ~v, ~w ∈ V then their vector sum ~v + ~w ∈ V and

• ~v + ~w = ~w + ~v (commutative)

• ~v + (~w + ~u) = (~v + ~w) + ~u for ~u ∈ V (associative)

• there is a zero vector ~0 ∈ V such that ~0 + ~v = ~v

• ∀~v ∈ V ∃~w such that ~w + ~v = ~0 (additive inverse)

2. If r, s ∈ R (scalars) and ~v, ~w ∈ V , then r~v ∈ V and

• (r + s)~v = r~v + s~v

• r(~v + ~w) = r~v + r ~w

• (rs)~v = r(s~v)

• 1~v = ~v

Observe that we are using two kinds of additions, the real numbers addition and the vectoraddition in V

(r + s)~v︸︷︷︸real numbers addition

= r~v + s~v︸︷︷︸vector addition

Example

• The set R2 is a vector space if the operations + and · have their usual meaning:(x1y1

)+

(x2y2

)=

(x1 + x2y1 + y2

)r ·(x1y1

)=

(r · x1r · y1

).

The zero vector of this vector space is ~0 =

(00

). In fact Rn is a vector space, for any n > 0.

• P = {

xyz

∈ Rn|x + y + z = 0} is a vector space: If v =

xyz

∈ Rn, then for any r ∈ R

r · v =

xyz

and r · x+ r · y + r · z = r · (x+ y + z) = 0, hence r · v ∈ P .

Algebra 22


• The set with only one element, the zero vector, is a vector space called the trivial vector space:{~0}.

• The set of polynomials of degree 3 or less with real coefficients, P3(R) = {a0 + a1x + a2x2 +

a3x3|a0, a1, a2, a3 ∈ R} is a vector space with the usual polynomial sum and product by

constant. In fact, Pn(R) is a vector space for any n > 0.

• The set of solutions of a homogeneous system of linear equations S = {v ∈ Rm|Av = 0},A ∈Mn×m(R) is also a vector space:

v,w ∈ S ⇒ A(v + w) = Av + Aw = 0

v,∈ S, r ∈ R⇒ A(rv) = rAv = 0

Definition 16. For any vector space V , any subset that is itself a vector space is a subspace of V

The linear combination of n vectors in the vector space E over K, with n ∈ N and coefficient sαi ∈ K(i = 1, . . . , n) is defined as

α1v1 + · · · + αnvn =

n∑i=1

αivi

and we will say that v ∈ E is a linear combination of v1, . . . , vn ∈ E if there exist a set of coefficients αi ∈ K(i = 1, . . . , n) such that

v = α1v1 + · · ·+ αnvn

Given n vectors, the subspace that is formed by all their possible linear combinations is called thesubspace ”g enerated” or ”spanned” by them, < v1, . . . , vn >. This set of vectors, represented by{v1, . . . , vn}, is called ”spanning set” of < v1, . . . , vn >.

Let’s imagine that we want to span the vector zero from a linear combination of vectors of theset {v1, . . . , vn}.If this is only possibly done by the so-called “trivial solution”, this is, with all αiequal to zero, t hen we will say that {v1, . . . , vn} is a set of vectors “linearly independent” or a “freeset”. If there exists some way to obtain 0 without all coefficients being 0, then we will say that{v1, . . . , vn} i s a set of linearly dependent vectors.

A “basis” is a set of vectors that spans the subspace and at the same time is linearlyindependent.

This is, B = v1, . . . , vn is a basis of the subspace V if:

• each vector of V is a linear combination of v1, . . . , vn, and

• the vectors v1, . . . , vn are linearly independent.

If so, there will exist an ordered list of scalars such that: v = α1v1 + · · · + αnvn. Thus, once weknow the vector of the basis we know the whole subspace.

Algebra 23


ExampleShow that in R4, the set of vectors whose components follow:

x1 + x2 + x3 + x4 = 0

form a vector subspace with dimension 3. Find a basis.

We can answer both questions at once. We only need to solve the systems of equations describingthe vector subspace. Thus, the solutions to:

x1 + x2 + x3 + x4 = 0

have the form:

x1 = −x2 − x3 − x4 = −a− b− cx2 = a

x3 = b

x4 = c

or, equivalently:x1x2x3x4

=

−a− b− c

abc

= a

−1100

+ b

−1010

+ c

−1001

This is to say, the vector subspace is spanned by these three vectors.

Exercise 1. Find out if in the vector space P2[x] of the polinomials with order less than or equal to2 over R, the fol lowing vectors form a basis:

u1 = 1 + 2xu2 = −1− 2x2

u3 = −2x+ 2x2

Exercise 2. Let be P3[x] the vector space of the polinomials or order 3 or less with real coefficientsand real variable ov er the commutative body R. Be the set of vectors G = {(x2 + x+ 2), (x3 + 3x)}belonging to P3[x]. Find a basis of P3[x] by completing the set G.

Lemma 1. For any nonempty subset W of a vector space V under the inheritet operations, thefollowing statements are equivalent

1. W is a subspace of V .

2. W is closed under linear combinations of pairs of vectors: ∀v1,v2 ∈ W and r1, r2 ∈ R,r1v1 + r2v2 ∈W .

3. W is closed under linear combinations of any number of vectors: ∀v1, · · · ,vn ∈ W andr1, · · · , rn ∈ R, r1v1 + · · ·+ rnvn ∈W .

Algebra 24


This last result tells us that to assess if a subset of a known vector space is also a vector space(a subspace), we don’t have to check everything, just that it is closed under linear combinations.

Definition 17. The span (or linar closure) of a nonempty subset, W , of vector space V is the setof all linear combinations of vectors from W :

[W ] = {c1w1 + · · ·+ cnwn|w1, · · · ,wn ∈W, c1, · · · , cn ∈ R}.

Lemma 2. In a vector space, the span of any subset is a subspace (i.e. the span is cloesed underlinear combinations). The converse also holds: any subspace is the span of some set.

Example

• The span of one vector v ∈ V , is: [{v}] = {r · v|r ∈ R}

• Any two linearly independent vectors span R2. For instance,[{(11

),

(−11

)}]= R2

Any vector

(xy

)can be written as: x+y

2

(11

)+ y−x

2

(−11

).

Exercise 3. Check that the set of vectors F = {(x, y, z) ∈ R3/x+y+z = 9} is not a vector subspacein R3.

Definition 18. If in a vector space there exist a basis formed by n elements and m > n, then wecan assure that any set of m vectors is linearly dependent. In any finite-dimensional vectorspace, all of the bases have the same number of elements The “dimension” of a vector spaceis the number of vectors in any of its bases. As a consequence, for the above mentioned vector space:

1. n linearly independent vectors form a basis

2. n spanning vectors form a basis

3. If V is a subspace of E, then V has a basis (V 6= 0), dimV ≤ n, and the equality only holds ifand only if V = E.

4. If r < n and v1, . . . , vr are linearly independent vectors, then there exist n − r vectorsvr+1, . . . , vn such that {v1, . . . , vr, vr+1, . . . , vn} is a basis of E.

ExampleWe consider, in the vector space R3 over R, two subspaces E1 = 〈(1, 1, 1), (1,−1, 1)〉 and E2 =〈(1, 2, 0), (3, 1, 3)〉

• Find the set of vectors that belong to E1 ∩ E2.

• Check if it is a subspace of R3.

Algebra 25


• What is the dimension of subspace {E1 ∩ E2}?

The solution is immediate if we consider a geometrical view. In R3, a vector subspace withdimension 2 is a plane and that two planes can intersect in a line or can be coincident. In bothcases, then, we would have vector subscapes. In this way, both planes can be found in an easy way,yielding E1 = {(x1, x2, x3) ∈ R3/x− z = 0} and E2 = {(x1, x2, x3) ∈ R3/2x− y − 5

3z = 0}. Joiningthese two expressions we will see that they are L.I. and in this way we would have three unknownvariables for two equations: one degree of freedom and thus we are describing a line on R3.

ExampleLet us consider the subspace in R4 defined as:

F = {(x1, x2, x3, x4) ∈ R4/x3 = 2x1 + 3x2;x4 = 2x2 − 3x1} (1)

Find a basis for the subspace and complete it until obtaining a basis for R5.

The equations defining the subspace can be written also as:

x1 = x1

x2 = x2

x3 = 2x1 + 3x2

x4 = 2x2 − 3x1

or, equivalently: x1x2x3x4

= a

102−3

+ b

0132

Thus, these two vectors form a basis of the subspace, with dimension 2. To complete the set ofvectors until having a basis in R4, we only need to choose two vectors that, along with the vectorswe already have, form a L.I. set of vectors. We could try, for example, vectors (1, 0, 0, 0) i (0, 1, 0, 0):

1 0 1 00 1 0 12 3 0 0−3 2 0 0

≈

1 0 1 00 1 0 10 3 −2 00 2 3 0

≈

1 0 1 00 1 0 10 0 −2 −30 0 3 2

≈

1 0 1 00 1 0 10 0 −2 −30 0 0 −13

Thus, the 4 chosen vectors are L.I. and therefore form a basis in R4.

Algebra 26


6.1 Basis change

The representation of a vector as a column of components depends, obviously, on the basis. Foreach basis the representation will be different. How do we relate these representations?

Given two different basis in a vector subspace, B = v1, . . . , vn and B′ = w1, . . . , wn, and knowingthe representation of vector v according to the first of the bases RepB(v)X = (xi)1≤i≤n ∈ Kn, thisis:

v = x1v1 + x2v2 + · · ·+ xnvn (2)

To obtain the representation of vector v according to the basis B′ it is enough with knowing therepresentation of the vectors vi in B′:

v1 = a11w1 + a21w2 + · · ·+ an1wnv2 = a12w1 + a22w2 + · · ·+ an2wn

...vn = a1nw1 + a2nw2 + · · ·+ annwn

By replacing this representation in Eq. 2 we get:

v = x1(a11w1 + a21w2 + · · ·+ an1wn)++ x2(a12w1 + a22w2 + · · ·+ an2wn)+...+ xn(a1nw1 + a2nw2 + · · ·+ annwn)

rearranging:v = (x1a11 + x2a12 + · · ·xna1n)w1

+ (x1a21 + x2a22 + · · ·xna2n)w1

...+ (x1an1 + x2an2 + · · ·xnann)w1

Writing this in matrix form we see that calling P the matrix that represents the vectors of thebasis B into the basis B′, Y = RepB′(v), this is, v = y1w1 + y2w2 + · · ·+ ynwn, is given by:

Y = PX (3)

or, what is the same, RepB′(v) = PRepB(v). This matrix P = (RepB′(v1),RepB′(v2), . . . ,RepB′(vn))is called “matrix of basis change”. This matrix can be inverted, and P−1 is the matrix for changingfrom basis B′ into B.

ExampleBe B′ = {u1,u2,u3} a basis in R3 and B = {v1,v2,v3} another basis in the same space, defined

as:

v1 = u1 − u3

v2 = u1 + 2u2 + u3

v3 = u2 + 2u3

Algebra 27


If w is a vector in R3with coordinates (2,1,-1) with respect to basis B, calculate the coordinates ofw with respect to basis B′.

The above equations directly yield the transformation matrix:

Rep′B(w) =

1 1 00 2 1−1 1 2

·RepB(w) =

1 1 00 2 1−1 1 2

21−1

=

31−4

ExampleGiven the vector (

12

)expressed in the base

B =

{(01

),

(11

)}what are its coordinates in the basis

B′ =

{(−10

),

(02

)}We only need to find the matrix for the basis transformation. This is built with the representationsof the vectors of the old basis with respect to the vectors of the new basis. Thus:(

01

)= α1

(−10

)+ β1

(02

)from which we can obtain (

α1

β1

)=

(012

)and (

11

)= α2

(−10

)+ β2

(02

)from which we can obtain (

α2

β2

)=

(−112

)Finally,

vB′ =

(0 −112

12

)· vB

vB′ =

(0 −112

12

)·(

12

)=

(−232

)

Algebra 28


6.2 The vector space L(V,W )

If V and W are two vector spaces over the same body K, a linear map F : V →W is a map that respects the following linear operations:

∀v, w ∈ V, F (v + w) = F (v) + F (w)

∀α ∈ K,∀v ∈ V, F (αv) = αF (v)

or, equivalently:

∀α1, α2 ∈ K,∀v1, v2 ∈ V, F (α1v1 + α2v2) = α1F (v1) alpha2F (v2)

Let us consider, for example, the matrices with m rows and n columns: A ∈ Mmn (K). These

matrices can be used to represent a linear map of Kn into Km:

FA : Kn → Km

o be,FA(X) = AX, ∀X ∈ Kn

This map is linear because it follows the above conditions. If we define a second linear map GA,analogous to FA it is simple to prove that the space formed by all poss ible linear transforms L(V,W )has the structure of a vector space.

Exercise 1Discuss if these transforms are or not linear:

F : R3 → R;F (X) = 2x− 3y + 4z

G : R2 → R3;G(X) = (x+ 1, 2y, x+ y)

6.3 Rangespace and nullspace (Kernel)

Rangespace of a linear transformation is the set of images of all the vectors of V , F (V ):

ImF = {w ∈W |∃v ∈ V withF (v) = w}

The dimension of the rangespace is the map’s rank, rg F = dim Im F . In any linear transformation,the rangespace of any subspace of the starting set into the arriving set is also a s ubspace. The wholerangespace of the linear transform is also a vector subspace. The nullspace or kernel of a linear mapis the inverse image of the zero vector in the arriving space:

NucF = KerF = {v ∈ V |F (v) = 0}

Both the rangespace and the kernel are vector subspaces.

A linear transformation is injective if and only if NucF = 0. If F : V → W is linear with NucF = {0} and v1, . . . , vn are linearly independent vectors of V , then F (v1), . . . , F (vn) are also linearlyindependent. Thus, for an injective linear transform, the rangespace of V is a basis of W . Somedefinitions:

Algebra 29


• homomorphism is equivalent to linear transform.

• epimorphism is a linear transform that is exhaustive in W .

• isomorphism is a one-to-one linear transform: both injective and exhaustive: bijective.

• endomorphism is a linear transform of a vector space in itself (also called sometimes operator)

• automorphisms are both endomorphisms and isomorphisms.

Rearranging the previous statements, is F is injective, it is also an isomorphism between V andIm F . If V is a vector space of finite dimension and F : V →W is linear, then,

dimV = dim NucF + dim ImF

dim NucF is sometimes caled the nulity of F and dim ImF its rank.

Exercise 2Let be F : R5 → R3 the linear transform defined as

F (X) = (x+ 2y + z − 3s+ 4t, 2x+ 5y + 4z − 5s+ 5t, x+ 4y + 5z − s− 2t)

. Find a basis and the dimension of the rangespace of F .

Exercise 3Let be the linear transform F : R3 → R3 that has an associated matrix on the canonical bases

F =

1 2 53 5 13−2 −1 −4

Find a basis and the dimension of both the rangespace and the kernel.

Exercise 4Find the kernel of the isoorphism H : M2

2 → P3 as defined by:(a bc d

)→ (a+ b+ 2d) + 0x+ cx2 + cx3

ExampleThe matrix in exercise 6.1 changes a vector from its representation in the base B to its representationin the base B′. Show that it is an automorphism. What would be the transformation matrix fromB′ to B? It can be shown in an analogous way to what we did in exercise 6.3. In this case thekernel is void: obvious! the only vector that when changing basis gets transformed into 0 is 0 itself!.A basis transformation is represented by a square matrix. If the kernel is empty and the matrix is

Algebra 30


square this means the the rank for the associated matrix is 3 in the present case (check it). Or wecan equivalently say that the determinant of the matrix is different than zero (check it). Applying:


we see that the dimension of the origin is the same as the dimension of the image, which is at thesame time the same as the whole final space: 3. Thus, we talk on an automorphism.

Using simple matrix algebra:Rep′B(w) = A ·RepB(w)

A−1 ·Rep′B(w) = A−1 ·A︸︷︷︸I

·RepB(w)

A−1 ·Rep′B(w) = RepB(w)

Thus, the matrix we are looking for is the inverse. This will exist, as a basis transformation is alwaysan automorphism.

ExampleGiven the linear transform (homomorphism) T : R3 → R2 defined by:

T (x1, x2, x3) = (x1 + x2, x2 + x3)

1. find the associated matrix

2. find the kernel of the transformation

3. is it an isomorphism? is it an epimorphism?

1. the associated matrix A will be given in general by the images of the vectors in the startingsubspace:

T (1, 0, 0) = (1, 0)

T (0, 1, 0) = (1, 1)

T (0, 0, 0) = (0, 1)

Thus:

A =

(1 1 00 1 1

)and thus if v ∈ R3 and v ∈ R2, we can represent the transformation by:

w = A · v

2. the kernel of the transformation will be formed by the vector subspace in the origin spacehaving as imatge the nul vector.

0 = A · vor: (

00

)=

(1 1 00 1 1

)·

xyz

Algebra 31


We solve the system:

0 = x+ y

0 = y + z

getting:

x = −y = −ay = a

z = −y = −a

or equivalently: xyz

= a

−11−1

thus, the dimension of the kernel is 1 and the vector (-1,1,-1) form a base of it.

3. it is not injective because the dimension of the kernel is not zero. Also, if V is a vector spacewith finite dimension and F : V →W is linear, then,


In our case 3=1+dim ImT . Thus, dim ImT = 2 = dimW , because W is R2. Thus thetransformation is exhaustive.

ExampleIs the application F : V→W, to which the following matrix is associated:

A =

1 −2 −4−2 0 41 3 1

bijective?

The matrix determinant is 0. Thus, the three columns, corresponding to the images of the vectorsthat form the canonical basis of V, are not linearly independent. The first two columns are L.I.,for example, and thus Dim(ImF ) = 2. As Dim(V) = 3, then Dim(KerF ) = 1. The transformis not injective, as the dimension of the kernel is not zero. The transform is not exhaustive as thedimension of the image is different than the dimension of W. The transform is not bijective by anyof these two reasons.

6.4 Composition and inverse

If F : V → W and G : W → U are two linear transforms, then G ◦ F : V → U is linear. Thiscomposition is associative, but not commutative.

Algebra 32


A linear transform F : V → W is invertible if there exists G : W → V linear, such thatG ◦ F = idV and F ◦ G = idW , and that we will call ”inverse”. Automorphisms F are one-to onelinear transforms, and in this cases F−1 is also an automorphism.

6.5 Linear transforms and matrices

We can express linear transforms as matrices. This is to say, once we have set the basis for thestarting and arrival spaces, we can stablish a one-to-one correspondence between linear transformsand matrices, which will have advantages because we know how to do matrix operations. Thiscorrespondence will be an isomorphism and the matrix corresponding to the linear transform willbe formed by the images of the basis of V .

For example:

Let us consider a linear transform h : R2 → R3. Let us consider that the basis of V and W are,respectively:

B =<

(20

),

(14

)>

D =<

100

,

0−20

,

101

>

The linear transform is defined by its action on the basis vectors in V :(20

)h7−→

111

(14

)h7−→

120

In order to evaluate how this linear transform affects any vector in its domain, first we need toexpress h(b1) and h(b2) in the basis of the rangespace:1

11

= 0

100

− 1

2

0−20

+ 1

101

so RepD(h(b1)) =

0−1/2

1

D

and 120

= 1

100

− 1

0−20

+ 0

101

so RepD(h(b2)) =

1−10

D

Now, for each member of the starting space, we can express its image according to h in terms of theimages of the basis vectors B:

h(v) = h(c1 ·(

20

)+ c2 ·

(14

))

= c1 · h(

(20

)) + c2 · h(

(14

))

= c1 · (0

100

− 1

2

0−20

+ 1

101

) + c2 · (1

100

− 1

0−20

+ 0

101

)

= (0c1 + 1c2) ·

100

+ (−1

2c1 − 1c2) ·

0−20

+ (1c1 + 0c2) ·

101

Algebra 33


Thus,

with RepB(~v) =

(c1c2

)then RepD(h(~v) ) =

0c1 + 1c2−(1/2)c1 − 1c2

1c1 + 0c2

.

For example,

with RepB(

(48

)) =

(12

)B

then RepD(h(

(48

)) ) =

2−5/2

1

.

We can express these calculaitons in matrix form: 0 1−1/2 −1

1 0

B,D

(c1c2

)B

=

0c1 + 1c2(−1/2)c1 − 1c2

1c1 + 0c2

D

The interesting part of this expression is that the matrix representing a linear transform is generated,simply, by putting in columns the images of the vectors of the domain basis as a function of thevectors in the basis of the image.

I a more formal way: Let us suppose that V and W are vector spaces with dimensions n and mwith basis B and D, and that h : V →W is a linear transform connecting them. If

RepD(h(b1)) =

h1,1h2,1

...hm,1

D

. . . RepD(h(bn)) =

h1,nh2,n

...hm,n

D

then

RepB,D(h) =

h1,1 h1,2 . . . h1,nh2,1 h2,2 . . . h2,n

...hm,1 hm,2 . . . hm,n

B,D

is the matrix representation of the transformation.

Exercise 5Represent the matrix of the linear transform tθ : R2 → R2 which transforms the vectors by rotating

them clockwise any given angle θ.

Every matrix represents a linear transform.

Algebra 34


6.6 Composition and matrix product

We already know how to change bases and we know how to represent linear transforms by means ofmatrices. Now we want to do the following scheme:

VBh−−−−→H

WD

idy id

yVB

h−−−−→H

WD

Or, what is identical in a matrix representation::

H = RepD,D(id) ·H · RepB,B(id) (∗)

For example, the matrix

T =

(cos(π/6) − sin(π/6)sin(π/6) cos(π/6)

)=

(√3/2 −1/2

1/2√

3/2

)represents, with respect to E1, E2, the linear transformation t : R2 → R2 that rotates the vectors π/6radians anticlockwise. We can transform this representation with respect to E2, E2 to another onewith respect to

B = 〈(

11

)(02

)〉 D = 〈

(−10

)(23

)〉

using what we just learnt:

R2E2

t−−−−→T

R2E2

idy id

yR2B

t−−−−→T

R2D

T = RepE2,D(id) · T · RepB,E2(id)

RepE2,D(id) can be written as the inverse of RepD,E2(id).

RepB,D(t) =

(−1 20 3

)−1(√3/2 −1/2

1/2√

3/2

)(1 01 2

)=

((5−

√3)/6 (3 + 2

√3)/3

(1 +√

3)/6√

3/3

)

Exercise 6Check if the effect of the new matrix is the same as the original matrix, with the new basis.

Algebra 35


7 Projection

7.1 Orthogonal Projection Into a Line

We first consider orthogonal projection into a line. To orthogonally project a vector ~v into a line`, darken a point on the line if someone on that line and looking straight up or down (from thatperson’s point of view) sees ~v.

The picture shows someone who has walked out on the line until the tip of ~v is straight overhead.That is, where the line is described as the span of some nonzero vector ` = {c · ~s

∣∣ c ∈ R}, the personhas walked out to find the coefficient c~p with the property that ~v − c~p · ~s is orthogonal to c~p · ~s.

We can solve for this coefficient by noting that because ~v− c~p~s is orthogonal to a scalar multiple of~s it must be orthogonal to ~s itself, and then the consequent fact that the dot product (~v − c~p~s) · ~sis zero gives that c~p = ~v · ~s/~s · ~s.

The orthogonal projection of ~v into the line spanned by a nonzero ~s is this vector.

proj[~s ](~v) =~v · ~s~s · ~s

· ~s (4)

The wording of that definition says ‘spanned by ~s ’ instead the more formal ‘the span of the set{~s }’. This casual first phrase is common.

Exercise 7In R3, the orthogonal projection of a general vectorxy

z

(5)

Algebra 36


into the y-axis is xyz

·0

10

0

10

·0

10

·0

10

=

0y0

(6)

which matches our intuitive expectation.

The picture above with the stick figure walking out on the line until ~v’s tip is overhead is oneway to think of the orthogonal projection of a vector into a line. We finish this subsection with twoother ways.

Thus, another way to think of the picture that precedes the definition is that it shows ~v asdecomposed into two parts, the part with the line (here, the part with the tracks, ~p), and thepart that is orthogonal to the line (shown here lying on the north-south axis). These two are “notinteracting” or “independent”, in the sense that the east-west car is not at all affected by the north-south part of the wind. So the orthogonal projection of ~v into the line spanned by ~s can be thoughtof as the part of ~v that lies in the direction of ~s.

This subsection has developed a natural projection map: orthogonal projection into a line. Assuggested by the examples, it is often called for in applications. The next subsection shows how thedefinition of orthogonal projection into a line gives us a way to calculate especially convienent basesfor vector spaces, again something that is common in applications. The final subsection completelygeneralizes projection, orthogonal or not, into any subspace at all.

7.2 Gram-Schmidt Orthogonalization

The prior subsection suggests that projecting into the line spanned by ~s decomposes a vector ~v intotwo parts

~v = proj[~s ](~v) +(~v − proj[~s ](~v)

)

that are orthogonal and so are “not interacting”. We will now develop that suggestion.

Vectors ~v1, . . . , ~vk ∈ Rn are mutually orthogonal when any two are orthogonal: if i 6= j then thedot product ~vi · ~vj is zero.

If the vectors in a set {~v1, . . . , ~vk} ⊂ Rn are mutually orthogonal and nonzero then that set islinearly independent.

Algebra 37


Exercise 8The members ~β1 and ~β2 of this basis for R2 are not orthogonal.

B = 〈(42

),

(13

)〉

However, we can derive from B a new basis for the same space that does have mutually orthogonalmembers. For the first member of the new basis we simply use ~β1.

~κ1 =

(42

)(7)

For the second member of the new basis, we take away from ~β2 its part in the direction of ~κ1,

~κ2 =

(13

)− proj[~κ1](

(13

)) =

(13

)−(21

)=

(−12

)

which leaves the part, ~κ2 pictured above, of ~β2 that is orthogonal to ~κ1 (it is orthogonal by thedefinition of the projection into the span of ~κ1). Note that, by the corollary, {~κ1, ~κ2} is a basis forR2.

An orthogonal basis for a vector space is a basis of mutually orthogonal vectors.

Exercise 9To turn this basis for R3

〈

111

,

021

,

103

〉 (8)

into an orthogonal basis, we take the first vector as it is given.

~κ1 =

111

(9)

Algebra 38


We get ~κ2 by starting with the given second vector ~β2 and subtracting away the part of it in thedirection of ~κ1.

~κ2 =

020

− proj[~κ1](

020

) =

020

−2/3

2/32/3

=

−2/34/3−2/3

(10)

Finally, we get ~κ3 by taking the third given vector and subtracting the part of it in the direction of~κ1, and also the part of it in the direction of ~κ2.

~κ3 =

103

− proj[~κ1](

103

)− proj[~κ2](

103

) =

−101

(11)

Again the corollary gives that

〈

111

,

−2/34/3−2/3

,

−101

〉 (12)

is a basis for the space.

The next result verifies that the process used in those examples works with any basis for any sub-space of an Rn (we are restricted to Rn only because we have not given a definition of orthogonalityfor other vector spaces).

If 〈~β1, . . . ~βk〉 is a basis for a subspace of Rn then, where

~κ1 = ~β1

~κ2 = ~β2 − proj[~κ1](~β2)

~κ3 = ~β3 − proj[~κ1](~β3)− proj[~κ2](

~β3)

...

~κk = ~βk − proj[~κ1](~βk)− · · · − proj[~κk−1]

(~βk)

the ~κ ’s form an orthogonal basis for the same subspace.

Beyond having the vectors in the basis be orthogonal, we can do more; we can arrange for eachvector to have length one by dividing each by its own length (we can normalize the lengths).

Exercise 10Find an orthonormal basis for this subspace of R4.

{

xyzw

∣∣ x− y − z + w = 0 and x+ z = 0} (13)

When using octave to do a Gram-Schmidt ortonormalization, the results are not always asexpected, because of the different methods implemented in the program:

Algebra 39


octave:24> X=[1,1,1;0,1,1;0,0,1]

X =

1 1 1

0 1 1

0 0 1

octave:25> Q=orth(X)

Q =

-0.73698 -0.59101 -0.32799

-0.59101 0.32799 0.73698

-0.32799 0.73698 -0.59101

7.3 Projection Into a Subspace

The prior subsections project a vector into a line by decomposing it into two parts: the part in theline proj[~s ](~v ) and the rest ~v−proj[~s ](~v ). To generalize projection to arbitrary subspaces, we followthis idea.

For any direct sum V = M ⊕N and any ~v ∈ V , the projection of ~v into M along N is

projM,N (~v ) = ~m (14)

where ~v = ~m+ ~n with ~m ∈M, ~n ∈ N .

This definition doesn’t involve a sense of ‘orthogonal’ so we can apply it to spaces other thansubspaces of an Rn. (Definitions of orthogonality for other spaces are perfectly possible, but wehaven’t seen any in this book.)

Exercise 11The space M2×2 of 2×2 matrices is the direct sum of these two.

M = {(a b0 0

) ∣∣ a, b ∈ R} N = {(

0 0c d

) ∣∣ c, d ∈ R} (15)

To project

A =

(3 10 4

)(16)

into M along N , we first fix bases for the two subspaces.

BM = 〈(

1 00 0

),

(0 10 0

)〉 BN = 〈

(0 01 0

),

(0 00 1

)〉 (17)

The concatenation of these

B = BM_BN = 〈

(1 00 0

),

(0 10 0

),

(0 01 0

),

(0 00 1

)〉 (18)

is a basis for the entire space, because the space is the direct sum, so we can use it to represent A.(3 10 4

)= 3 ·

(1 00 0

)+ 1 ·

(0 10 0

)+ 0 ·

(0 01 0

)+ 4 ·

(0 00 1

)(19)

Algebra 40


Now the projection of A into M along N is found by keeping the M part of this sum and droppingthe N part.

projM,N (

(3 10 4

)) = 3 ·

(1 00 0

)+ 1 ·

(0 10 0

)=

(3 10 0

)(20)

Exercise 12Both subscripts on projM,N (~v ) are significant. The first subscript M matters because the result of

the projection is an ~m ∈ M , and changing this subspace would change the possible results. For anexample showing that the second subscript matters, fix this plane subspace of R3 and its basis

M = {

xyz

∣∣ y − 2z = 0} BM = 〈

100

,

021

〉 (21)

and compare the projections along two different subspaces.

N = {k

001

∣∣ k ∈ R} N = {k

01−2

∣∣ k ∈ R} (22)

(Verification that R3 = M ⊕N and R3 = M ⊕ N is routine.) We will check that these projectionsare different by checking that they have different effects on this vector.

~v =

225

(23)

For the first one we find a basis for N

BN = 〈

001

〉 (24)

and represent ~v with respect to the concatenation BM_BN .2

25

= 2 ·

100

+ 1 ·

021

+ 4 ·

001

(25)

The projection of ~v into M along N is found by keeping the M part and dropping the N part.

projM,N (~v ) = 2 ·

100

+ 1 ·

021

=

221

(26)

For the other subspace N , this basis is natural.

BN = 〈

01−2

〉 (27)

Algebra 41


Representing ~v with respect to the concatenation225

= 2 ·

100

+ (9/5) ·

021

− (8/5) ·

01−2

(28)

and then keeping only the M part gives this.

projM,N (~v ) = 2 ·

100

+ (9/5) ·

021

=

218/59/5

(29)

Therefore projection along different subspaces may yield different results.

ExampleThe members ~β1 and ~β2 of this basis for R2 are not orthogonal.

B = 〈(42

),

(13

)〉

Can we derive from B a new basis for the same space that does have mutually orthogonal members?

For the first member of the new basis we simply use ~β1.

~κ1 =

(42

)For the second member of the new basis, we take away from ~β2 its part in the direction of ~κ1,

~κ2 =

(13

)− proj[~κ1](

(13

)) =

(13

)−(21

)=

(−12

)

which leaves the part, ~κ2 pictured above, of ~β2 that is orthogonal to ~κ1 (it is orthogonal by thedefinition of the projection into the span of ~κ1). Note that {~κ1, ~κ2} is a basis for R2.

These pictures compare the two maps. Both show that the projection is indeed ‘into’ the planeand ‘along’ the line.

Algebra 42


Notice that the projection along N is not orthogonal -there are members of the plane M that arenot orthogonal to the dotted line. But the projection along N is orthogonal.

A natural question is: what is the relationship between the projection operation defined above,and the operation of orthogonal projection into a line? The second picture above suggests the answer-orthogonal projection into a line is a special case of the projection defined above; it is just projectionalong a subspace perpendicular to the line.

In addition to pointing out that projection along a subspace is a generalization, this scheme showshow to define orthogonal projection into any subspace of Rn, of any dimension.

The orthogonal complement of a subspace M of Rn is

M⊥ = {~v ∈ Rn∣∣ ~v is perpendicular to all vectors in M} (30)

(read “M perp”). The orthogonal projection projM (~v ) of a vector is its projection into M alongM⊥.

Exercise 13In R3, to find the orthogonal complement of the plane

P = {

xyz

∣∣ 3x+ 2y − z = 0} (31)

we start with a basis for P .

B = 〈

103

,

012

〉 (32)

Any ~v perpendicular to every vector in B is perpendicular to every vector in the span of B. Therefore,the subspace P⊥ consists of the vectors that satisfy these two conditions.1

03

·v1v2v3

= 0

012

·v1v2v3

= 0 (33)

We can express those conditions more compactly as a linear system.

P⊥ = {

v1v2v3

∣∣ (1 0 30 1 2

)v1v2v3

=

(00

)} (34)

Algebra 43


We are thus left with finding the nullspace of the map represented by the matrix, that is, withcalculating the solution set of a homogeneous linear system.

P⊥ = {

v1v2v3

∣∣ v1 + 3v3 = 0v2 + 2v3 = 0

} = {k

−3−21

∣∣ k ∈ R} (35)

Exercise 14Where M is the xy-plane subspace of R3, what is M⊥? A common first reaction is that M⊥ is

the yz-plane, but that’s not right. Some vectors from the yz-plane are not perpendicular to everyvector in the xy-plane.

110

6⊥032

θ = arccos(1 · 0 + 1 · 3 + 0 · 2√

2 ·√13

) ≈ 0.94 rad

Instead M⊥ is the z-axis, since proceeding as in the prior example and taking the natural basis forthe xy-plane gives this.

M⊥ = {

xyz

∣∣ (1 0 00 1 0

)xyz

=

(00

)} = {

xyz

∣∣ x = 0 and y = 0} (36)

The next result justifies the second sentence.

Let M be a subspace of Rn. The orthogonal complement of M is also a subspace. The spaceis the direct sum of the two Rn = M ⊕M⊥. And, for any ~v ∈ Rn, the vector ~v − projM (~v ) isperpendicular to every vector in M .

Let ~v be a vector in Rn and let M be a subspace of Rn with basis 〈~β1, . . . , ~βk〉. If A is the matrix

whose columns are the ~β’s then projM (~v ) = c1~β1 + · · ·+ck~βk where the coefficients ci are the entriesof the vector (ATA)AT · ~v. That is, projM (~v ) = A(ATA)−1AT · ~v.

Exercise 15To orthogonally project this vector into this subspace

~v =

1−11

P = {

xyz

∣∣ x+ z = 0} (37)

Algebra 44


first make a matrix whose columns are a basis for the subspace

A =

0 11 00 −1

(38)

and then compute.

A(ATA

)−1AT =

0 11 00 −1

( 0 11/2 0

)(1 0 −10 1 0

)

=

1/2 0 −1/20 1 0−1/2 0 1/2

With the matrix, calculating the orthogonal projection of any vector into P is easy.

projP (~v) =

1/2 0 −1/20 1 0−1/2 0 1/2

1−11

=

0−10

(39)

octave:1> A=[0,1;1,0;0,-1]

A =

0 1

1 0

0 -1

octave:2> A*inv(A’*A)*A’

ans =

0.50000 0.00000 -0.50000

0.00000 1.00000 0.00000

-0.50000 0.00000 0.50000

octave:3> v=[1,-1,1]

v =

1 -1 1

octave:4> A*inv(A’*A)*A’*v’

ans =

0

-1

0

Algebra 45


ExampleOrthogonally project the vector

(23

)into the line y = 2x. We first pick a direction vector for the

line. For instance,

~s =

(12

)will do. Then the calculation is routine.

23

·

12

12

·

12

·(12

)=

8

5·(12

)=

(8/516/5

)

Exercise 16Get the best linear fit to this data by means of the above procedure.

Table 1: Cancer Deaths in Oregon. Source: R. Fadeley, Journal ofEnvironmental Health 27 (1965), pp. 883-897.

County/city Index Deaths

Umatilla 2.5 147Morrow 2.6 130Gilliam 3.4 130

Sherman 1.3 114Wasco 1.6 138

Hood River 3.8 162Portland 11.6 208Columbia 6.4 178Clatsop 8.3 210

8 Diagonalization

8.1 Diagonalizability

A transformation is diagonalizable if it has a diagonal representation with respect to the same basisfor the codomain as for the domain. A diagonalizable matrix is one that is similar to a diagonalmatrix: T is diagonalizable if there is a nonsingular P such that PTP−1 is diagonal.

Exercise 17

Algebra 46


The matrix (4 −21 1

)(40)

is diagonalizable. (2 00 3

)=

(−1 21 −1

)(4 −21 1

)(−1 21 −1

)−1(41)

Exercise 18Not every matrix is diagonalizable. The square of

N =

(0 01 0

)(42)

is the zero matrix. Thus, for any map n that N represents (with respect to the same basis for thedomain as for the codomain), the composition n ◦n is the zero map. This implies that no such mapn can be diagonally represented (with respect to any B,B) because no power of a nonzero diagonalmatrix is zero. That is, there is no diagonal matrix in N ’s similarity class.

That exercise shows that a diagonal form will not do for a canonical form -we cannot find adiagonal matrix in each matrix similarity class. However, the canonical form that we are developinghas the property that if a matrix can be diagonalized then the diagonal matrix is the canonicalrepresentative of the similarity class. The next result characterizes which maps can be diagonalized.

A transformation t is diagonalizable if and only if there is a basis B = 〈~β1, . . . , ~βn〉 and scalars

λ1, . . . , λn such that t(~βi) = λi~βi for each i.

Exercise 19To diagonalize

T =

(3 20 1

)(43)

we take it as the representation of a transformation with respect to the standard basis T =RepE2,E2(t) and we look for a basis B = 〈~β1, ~β2〉 such that

RepB,B(t) =

(λ1 00 λ2

)(44)

that is, such that t(~β1) = λ1~β1 and t(~β2) = λ2~β2.(3 20 1

)~β1 = λ1 · ~β1

(3 20 1

)~β2 = λ2 · ~β2 (45)

We are looking for scalars x such that this equation(3 20 1

)(b1b2

)= x ·

(b1b2

)(46)

Algebra 47


has solutions b1 and b2, which are not both zero. Rewrite that as a linear system.

(3− x) · b1 + 2 · b2 = 0(1− x) · b2 = 0

(∗)

In the bottom equation the two numbers multiply to give zero only if at least one of them is zero sothere are two possibilities, b2 = 0 and x = 1. In the b2 = 0 possibility, the first equation gives thateither b1 = 0 or x = 3. Since the case of both b1 = 0 and b2 = 0 is disallowed, we are left looking atthe possibility of x = 3. With it, the first equation in (∗) is 0 · b1 + 2 · b2 = 0 and so associated with3 are vectors with a second component of zero and a first component that is free.(

3 20 1

)(b10

)= 3 ·

(b10

)(47)

That is, one solution to (∗) is λ1 = 3, and we have a first basis vector.

~β1 =

(10

)(48)

In the x = 1 possibility, the first equation in (∗) is 2 · b1 + 2 · b2 = 0, and so associated with 1 arevectors whose second component is the negative of their first component.(

3 20 1

)(b1−b1

)= 1 ·

(b1−b1

)(49)

Thus, another solution is λ2 = 1 and a second basis vector is this.

~β2 =

(1−1

)(50)

To finish, drawing the similarity diagram

R2w.r.t. E2

t−−−−→T

R2w.r.t. E2

idy id

yR2

w.r.t. Bt−−−−→D

R2w.r.t. B

(51)

and noting that the matrix RepB,E2(id) is easy leads to this diagonalization.(3 00 1

)=

(1 10 −1

)−1(3 20 1

)(1 10 −1

)(52)

8.2 Eigenvalues and Eigenvectors

A transformation t : V → V has a scalar eigenvalue λ if there is a nonzero eigenvector ~ζ ∈ V suchthat t(~ζ) = λ · ~ζ.

Algebra 48


Exercise 20The projection map xy

z

π7−→

xy0

x, y, z ∈ C (53)

has an eigenvalue of 1 associated with any eigenvector of the formxy0

(54)

where x and y are non-0 scalars. On the other hand, 2 is not an eigenvalue of π since no non-~0vector is doubled.

That example shows why the ‘non-~0’ appears in the definition. Disallowing ~0 as an eigenvectoreliminates trivial eigenvalues.

Exercise 21The only transformation on the trivial space {~0 } is ~0 7→ ~0. This map has no eigenvalues because

there are no non-~0 vectors ~v mapped to a scalar multiple λ · ~v of themselves.

Exercise 22Consider the homomorphism t : P1 → P1 given by c0 + c1x 7→ (c0 + c1) + (c0 + c1)x. The range of t

is one-dimensional. Thus an application of t to a vector in the range will simply rescale that vector:c + cx 7→ (2c) + (2c)x. That is, t has an eigenvalue of 2 associated with eigenvectors of the formc+ cx where c 6= 0.

This map also has an eigenvalue of 0 associated with eigenvectors of the form c−cx where c 6= 0.

A square matrix T has a scalar eigenvalue λ associated with the non-~0 eigenvector ~ζ if T~ζ = λ ·~ζ.

Although this extension from maps to matrices is obvious, there is a point that must be made.Eigenvalues of a map are also the eigenvalues of matrices representing that map, and so similarmatrices have the same eigenvalues. But the eigenvectors are different -similar matrices need nothave the same eigenvectors.

For instance, consider again the transformation t : P1 → P1 given by c0 + c1x 7→ (c0 + c1) + (c0 +c1)x. It has an eigenvalue of 2 associated with eigenvectors of the form c + cx where c 6= 0. If werepresent t with respect to B = 〈1 + 1x, 1− 1x〉

T = RepB,B(t) =

(2 00 0

)(55)

then 2 is an eigenvalue of T , associated with these eigenvectors.

{(c0c1

) ∣∣ (2 00 0

)(c0c1

)=

(2c02c1

)} = {

(c00

) ∣∣ c0 ∈ C, c0 6= 0} (56)

Algebra 49


On the other hand, representing t with respect to D = 〈2 + 1x, 1 + 0x〉 gives

S = RepD,D(t) =

(3 1−3 −1

)(57)

and the eigenvectors of S associated with the eigenvalue 2 are these.

{(c0c1

) ∣∣ ( 3 1−3 −1

)(c0c1

)=

(2c02c1

)} = {

(0c1

) ∣∣ c1 ∈ C, c1 6= 0} (58)

Thus similar matrices can have different eigenvectors.

Here is an informal description of what’s happening. The underlying transformation doubles theeigenvectors ~v 7→ 2 ·~v. But when the matrix representing the transformation is T = RepB,B(t) thenit “assumes” that column vectors are representations with respect to B. In contrast, S = RepD,D(t)“assumes” that column vectors are representations with respect to D. So the vectors that getdoubled by each matrix look different.

The next example illustrates the basic tool for finding eigenvectors and eigenvalues.

Exercise 23If

S =

(π 10 3

)(59)

(here π is not a projection map, it is the number 3.14 . . .) then∣∣∣∣(π − x 10 3− x

)∣∣∣∣ = (x− π)(x− 3) (60)

so S has eigenvalues of λ1 = π and λ2 = 3. To find associated eigenvectors, first plug in λ1 for x:(π − π 1

0 3− π

)(z1z2

)=

(00

)=⇒

(z1z2

)=

(a0

)(61)

for a scalar a 6= 0, and then plug in λ2:(π − 3 1

0 3− 3

)(z1z2

)=

(00

)=⇒

(z1z2

)=

(−b/π − 3

b

)(62)

where b 6= 0.

The characteristic polynomial of a square matrix T is the determinant of the matrix T − xI,where x is a variable. The characteristic equation is |T − xI| = 0. The characteristic polynomial ofa transformation t is the polynomial of any RepB,B(t).

A linear transformation on a nontrivial vector space has at least one eigenvalue. Notice thefamiliar form of the sets of eigenvectors in the above examples.

The eigenspace of a transformation t associated with the eigenvalue λ is Vλ = {~ζ∣∣ t(~ζ ) = λ~ζ } ∪

{~0 }. The eigenspace of a matrix is defined analogously. An eigenspace is a subspace.

Algebra 50


Exercise 24In the above example the eigenspace associated with the eigenvalue π and the eigenspace associated

with the eigenvalue 3 are these.

Vπ = {(a0

) ∣∣ a ∈ R} V3 = {(−b/π − 3

b

) ∣∣ b ∈ R} (63)

Exercise 25In the above example, these are the eigenspaces associated with the eigenvalues 0 and 2.

V0 = {

a−aa

∣∣ a ∈ R}, V2 = {

b0b

∣∣ b ∈ R}. (64)

The characteristic equation is 0 = x(x−2)2 so in some sense 2 is an eigenvalue “twice”. Howeverthere are not “twice” as many eigenvectors, in that the dimension of the eigenspace is one, not two.The next example shows a case where a number, 1, is a double root of the characteristic equationand the dimension of the associated eigenspace is two.

Exercise 26With respect to the standard bases, this matrix1 0 0

0 1 00 0 0

(65)

represents projection. xyz

π7−→

xy0

x, y, z ∈ C (66)

Its eigenspace associated with the eigenvalue 0 and its eigenspace associated with the eigenvalue 1are easy to find.

V0 = {

00c3

∣∣ c3 ∈ C} V1 = {

c1c20

∣∣ c1, c2 ∈ C} (67)

If two eigenvectors ~v1 and ~v2 are associated with the same eigenvalue then any linear combinationof those two is also an eigenvector associated with that same eigenvalue. But, if two eigenvectors~v1 and ~v2 are associated with different eigenvalues then the sum ~v1 + ~v2 need not be related tothe eigenvalue of either one. In fact, just the opposite. If the eigenvalues are different then theeigenvectors are not linearly related.

For any set of distinct eigenvalues of a map or matrix, a set of associated eigenvectors, one pereigenvalue, is linearly independent.

Algebra 51


Exercise 27The eigenvalues of 2 −2 2

0 1 1−4 8 3

(68)

are distinct: λ1 = 1, λ2 = 2, and λ3 = 3. A set of associated eigenvectors like

{

210

,

944

,

212

} (69)

is linearly independent.

An n×n matrix with n distinct eigenvalues is diagonalizable.

ExampleWhat are the eigenvalues and eigenvectors of this matrix?

T =

1 2 12 0 −2−1 2 3

To find the scalars x such that T~ζ = x~ζ for non-~0 eigenvectors ~ζ, bring everything to the left-handside 1 2 1

2 0 −2−1 2 3

z1z2z3

− xz1z2z3

= ~0

and factor (T −xI)~ζ = ~0. (Note that it says T −xI; the expression T −x doesn’t make sense becauseT is a matrix while x is a scalar.) This homogeneous linear system1− x 2 1

2 0− x −2−1 2 3− x

z1z2z3

=

000

has a non-~0 solution if and only if the matrix is singular. We can determine when that happens.

0 = |T − xI|

=

∣∣∣∣∣∣1− x 2 1

2 0− x −2−1 2 3− x

∣∣∣∣∣∣= x3 − 4x2 + 4x

= x(x− 2)2

The eigenvalues are λ1 = 0 and λ2 = 2. To find the associated eigenvectors, plug in each eigenvalue.Plugging in λ1 = 0 gives1− 0 2 1

2 0− 0 −2−1 2 3− 0

z1z2z3

=

000

=⇒

z1z2z3

=

a−aa

Algebra 52


for a scalar parameter a 6= 0 (a is non-0 because eigenvectors must be non-~0). In the same way,plugging in λ2 = 2 gives1− 2 2 1

2 0− 2 −2−1 2 3− 2

z1z2z3

=

000

=⇒

z1z2z3

=

b0b

with b 6= 0.

When using octave, the diagonalization has this simple form:

octave:5> b=[1,2,1;2,0,-2;-1,2,3]

b =

1 2 1

2 0 -2

-1 2 3

octave:6> [ev,eval]=eig(b)

ev =

5.7735e-01 -7.0711e-01 7.0711e-01

-5.7735e-01 -2.0000e-08 -2.0000e-08

5.7735e-01 -7.0711e-01 7.0711e-01

eval =

0.00000 0.00000 0.00000

0.00000 2.00000 0.00000

0.00000 0.00000 2.00000

Algebra 53


9 Singular value decomposition (SVD) and principal compo-nent analysis (PCA)

• The Singular Values of the square matrix A are defined as the square root of the eigenvaluesof ATA.

• The Condition Number is the ratio of the largest to the smallest singular value.

• A matrix is Ill Conditioned Matrix if the condition number is too large. How large thecondition number can be, before the matrix is ill conditioned, is determined by the machineprecision.

• A matrix is Singular if the condition number is infinite. The determinant of a singular matrixis 0.

• The Rank of a matrix is the dimension of the range of the matrix. This corresponds to thenumber of non-singular values for the matrix, i.e. the number of linear independent rows ofthe matrix.

9.1 Spectral decomposition of a square matrix

Any real symmetric m×m matrix A has a spectral decomposition of the form

A = UΛUT (70)

where U is an orthonormal matrix (matrix of orthogonal unit vectors: UTU = I or∑k ukiukj = δij

and Λ is a diagonal matrix. The columns of U are the eigenvectors of matrix A and the diagonalelements of Λ are the eigenvalues. If A is positive-definite, the eigenvalues will all be positive.Multiplying with U , equation 70 can be re-written to

AU = UΛUTU = UΛ

This can be written as a normal eigenvalue equation by defining the ith column of U as ui and theeigenvalues as λi = Λii:

Aui = λiui

9.2 Singular Value Decomposition

A real n×m matrix B, where n ≥ m has the decomposition,

B = UΓV T (71)

where U is a n×m matrix with orthonormal columns (UTU = I), while V is a n×n orthonormalmatrix (V TV = I), and Γ is a m×m diagonal matrix with positive or zero elements, called thesingular values. From B we can construct two positive-definite symmetric matrices, BBT and BTB,each of which we can decompose

BBT = UΓV TV ΓUT = UΓ2UT

BTB = V Γ2V T

Algebra 54


Keep in mind that n ≥ m. We can now show that BBT which is n×n and BTB which is m×m willshare m eigenvalues and the remaining n−m eigenvalues of BBT will be zero.

Using the decomposition above, we can identify the eigenvectors and eigenvalues for BTB asthe columns of V and the squared diagonal elements of Γ, respectively. (The latter shows that theeigenvalues of BTB must be non-negative). Denoting one such eigenvector by v and the diagonalelement by γ, we have

BTBv = γ2v

then we can multiply on both sides with B to get

BBTBv = γ2Bv

But this means that we have an eigenvector u = Bv and eigenvalue γ2 for BBT as well, since

(BBT)Bv = γ2Bv

We have now shown that BBT and BTB share m eigenvalues.

We still need to prove that the remaining n×m eigenvalues of BBT are zero. To do that let usconsider an eigenvector for BBT, u⊥ = β⊥u⊥ which is orthogonal to the m eigenvectors ui alreadydetermined, i.e. UTu⊥ = 0. Using the decomposition BBT = UΓ2UT, we immediately see that theeigenvalues β⊥ must all be zero

BBTu⊥ = UΓ2UTu⊥ = 0u⊥

The Rank R of BBT is determined by the smallest dimension of B, (R ≤ m). This ensures thatBBT has at most m eigenvalues larger than zero. Note that the relation for BBT corresponds tothe usual spectral decomposition since the “missing” (n−m) eigenvalues are zero. It is then evidentthat the two square matrices can be interchanged. This is a property we can advantage of whendealing with data matrices where we have many more features than examples.

Summarizing, in equation 71, matrices U and V are such that they are orthogonal. The columnsof U are called left singular values (gene coefficients) and the rows of V T are called right singularvalues (expression level vectors). To calculate the matrices U and V , one must calculate theeigenvectors and eigenvalues of BBT and BTB. These multiplications of B by its transpose resultsin a square matrix (the number of columns is equal to the number of rows).

The columns of V are made from the eigenvectors of BTB and the columns of U are made fromthe eigenvectors of BBT. The eigenvalues obtained from the products of BBT and BTB, whensquare-rooted, make up the columns of Γ in equation 71. The diagonal of Γ is said to be the singularvalues of the original matrix, B.

9.3 Properties of a data matrix -first and second moments

Let x (with components xj , j = 1, . . . , n) be a stochastic vector with probability distribution P (x).Let {xα

∣∣ α = 1, . . . ,m} be a sample from P (x). We will choose a convention for the data matrixX, where the rows denote the features j = 1, . . . , n and the columns the samples α = 1, . . . ,m: inother words the components are Xj,α = xαj . Principal component analysis is based on the two firstempirical moments of the sample data matrix. The mean vector

〈x〉 ≡ 1

m

m∑α=1

xα

Algebra 55


and the empirical covariance matrix,

C ≡ 1

m

m∑α=1

(xα − 〈x〉)(xα − 〈x〉)T

Using the matrix formulation we can write

C ≡ 1

mXXT

where we have removed the mean of the data:

Xj,α := Xj,α − 〈xj〉

9.4 Principal component analysis (PCA)

In principal component analysis we find the directions in the data with the most variation, i.e. theeigenvectors corresponding to the largest eigenvalues of the covariance matrix, and project the dataonto these directions. The motivation for doing this is that the most second order information arein these directions3. Each eigenvector described in section 9.2 represents a principle component,PC1 (Principle Component 1), which is defined as the eigenvector with the highest correspondingeigenvalue. The individual eigenvalues are numerically related to the variance they capture via PC’s- the higher the value, the more variance they have captured. The choice of the number of directionsare often guided by trial and error, but principled methods also exist. If we denote the matrix ofeigenvectors sorted according to eigenvalue by U , we can then PCA transformation of the data asY = UTX. The eigenvectors are called the principal components. By selecting only the first d rowsof Y , we have projected the data from n down to d dimensions.

9.5 PCA by SVD

We can use SVD to perform PCA. We decompose X using SVD, i.e.

X = UΓV T

and find that we can write the covariance matrix as

C =1

nXXT =

1

nUΓ2UT

In this case U is a n×m matrix. Following from the fact that SVD routine order the singular valuesin descending order we know that, if n < m, the first n columns in U corresponds to the sortedeigenvalues of C and if m ≥ n, the first m corresponds to the sorted non-zero eigenvalues of C. Thetransformed data can thus be written as

Y = UTX = UTUΓV T

where UTU is a simple n×m matrix which is one on the diagonal and zero everywhere else. Toconclude, we can write the transformed data in terms of the SVD decomposition of X.

3This also means we might discard important non-second order information by PCA

Algebra 56


9.6 PCA by SVD in Octave

It is common in gene expression analysis or QSAR studies that we have many more features thansamples, n � m. The covariance matrix itself is therefore very unpleasant to work with because itis very large and as we have proved above singular. However, using the relations above, we find thatis suffices to decompose the smaller m×m matrix

D ≡ 1

mXTX

Given a decomposition of D we can find the interesting non-zero principal directions and componentsfor C, U = XV S−1. You can instruct octave to always use the smallest matrix by using the command[u s v] = svd(X,0), see also ‘help svd’ in octave. However, in that case we have to be carefulabout which matrices to use for the transformation.

ExampleFrom the following matrix a we can obtain the eigenvalues and eigenvectors of the square of the

matrix by:

octave:1> a=[2,4;1,3;0,0;0,0]

a =

2 4

1 3

0 0

0 0

octave:2> a*a’

ans =

20 14 0 0

14 10 0 0

0 0 0 0

0 0 0 0

octave:3> [vect,val]=eig( a*a’)

vect =

-0.00000 0.00000 0.57605 -0.81742

-0.00000 0.00000 -0.81742 -0.57605

1.00000 0.00000 0.00000 -0.00000

0.00000 1.00000 0.00000 -0.00000

val =

0.00000 0.00000 0.00000 0.00000

0.00000 0.00000 0.00000 0.00000

0.00000 0.00000 0.13393 0.00000

0.00000 0.00000 0.00000 29.86607

Algebra 57


which will lead to the building of the SVD for that matrix a. Now, we can do a similar job bysimply performing a regular SVD in octave:

octave:4> [u s v] = svd(a)

u =

-0.81742 -0.57605 0.00000 0.00000

-0.57605 0.81742 0.00000 0.00000

0.00000 0.00000 1.00000 0.00000

0.00000 0.00000 0.00000 1.00000

s =

5.46499 0.00000

0.00000 0.36597

0.00000 0.00000

0.00000 0.00000

v =

-0.40455 -0.91451

-0.91451 0.40455

Exercise Using the gene expression data in the file FA.zip, perform an SVD and find the PC’s.

9.7 More samples than variables

In some cases, the number of variables is smaller than the number of examples (n < m). In thesecases, decomposition and dimension reduction might still be desirable for the n×m matrix X.Dimension change on X however also results in dimension change on U , Γ and V , who respectivelyget the sizes n×n, n×m and m×m. The dimension changes the svd routine in octave slow and addsunnecessary rows to the V matrix. The problem can be avoided using [V; S; U] = svd(X’; 0);

U = U’; V = V’;, in the cases where n < m.

9.8 Number of Principal Directions

The no of principal components to use d, is not always easy to determine. The energy fractioncould be used to argue for the usage of a given number of principal components. The number ofcomponents could also be determined from the characteristics of the singular values. When thesingular values stabilize, the remaining components is usually contaminated with much noise andtherefore not useful.

Algebra 58


9.9 Similar Methods for Dimensionality Reduction

There exists multiple methods that can be used for dimensionality reduction. Some of them aregiven in the list below.

• Singular Value Decomposition (SVD)i

• Independent Component Analysis (ICA)

• Non-negative Matrix Factorization (NMF)

• Eigen Decomposition

• Random Projection

• Factor Analysis (FA)

References

[1] JJ Faraway. Practical Regression and Anova using R. 2002. URL http://cran. r-project.org/doc/contrib/Faraway-PRA. pdf, 2002.

[2] A. Isaev. Introduction to mathematical methods in bioinformatics. Springer Verlag, 2004.

[3] J. Hefferon. Linear Algebra. web edition, 2008.

Algebra 59

algebra 1011

Documents

b andno

reference set

unique set

set aalgebra3

element x

set formed of4 elements

collection of elements

elements of mathematics