01 linear algebra

8/3/2019 01 Linear Algebra

1/27

1 VECTOR SPACES 1

Linear Algebra

1 Vector Spaces

The multiplication of a vector by a constant and the addition of two

vectors are familiar ideas; their abstraction and generalization lead to

the concept of vector spaces.

x

x

2x

y

x + y

Definition 1. A vector space is a set V of elements called vectors satis-

fying the following axioms.

1. For every x , y , z V, there is an operation called vector addition,

such that

(a) x + y = y + x (commutative);

(b) x + (y + z) = (x + y) + z (associative);

(c) there exists in V a unique vector 0 (called the zero vector) such

that x + 0 = x for every x V;

(d) for every x V there exists a unique vector x such that

x + (x) = 0.

2. For every x, y V and every , F, where F is a field, there is

an operation called scalar multiplication, such that

(a) (x) = ()x (associative);

(b) 1x = x for every x V, where 1 is the unit element ofF under

multiplication;

C.P. Kwong


2/27

1 VECTOR SPACES 2

(c) (x + y) = x + y (distributive with respect to vector addi-

tion);

(d) ( + )x = x + x (distributive with respect to scalar addi-

tion).

Remark. Without giving a formal definition of a field, we simply note that

the set of all real numbers, denoted as R, is a field equipped with the

usual arithmetics of addition/subtraction and multiplication/division.

The unit element of this field is exactly the number "1". The set of all

complex numbers is another example of field. Sometimes we say "V is

a vector space over F" to emphasize the relationship between V and its

underlying field F.

Example 1. (The n-tuple space, Fn.) Let F be any field and let V be

the set of all n tuples x = (u1, u2, . . . , un) of scalars ui F. If y =

(w1, w2, . . . , wn) with wi F, define the addition ofx and y as

x + y = (u1 + w1, u2 + w2, . . . , un + wn)

and the multiplication of x by a scalar F as

x = (u1, u2, . . . , un).

It can be proved that the defined operations satisfy the axioms of a vector

space and hence V is a vector space over F.

Example 2. (The space of m n matrices, Fmn.) Let F be any field and

m, n be positive integers. The set of all m n matrices with elementsin F is a vector space under the usual matrix addition and multiplication

of a matrix by a scalar.

Example 3. (The space of continuous functions, C[a,b].) Let V be the

set of all real-valued, continuous functions of t, t [a,b].

C.P. Kwong


3/27

1 VECTOR SPACES 3

x(t)

y(t)

ta b

Define, for x, y V and R, the following operations:

x + y = x(t) + y(t),

x = x(t),

where addition of two continuous functions and multiplication of a con-

tinuous function by a real scalar are defined in the usual point-wise man-

ner. Then, V is a vector space overR. Note that ifx and y are continuous

real-valued functions over [a, b] and is real, so are x + y and x.

Definition 2. A vector x V is said to be a linear combination of the vec-

tors y1, y2, . . . , y n V provided that there exist scalars 1, 2, . . . , n

F such that

x = 1y1 + 2y2 + + nyn =

ni=1

iyi. (1)

Definition 3. Let V be a vector space over F. The distinct vectors x1, x2, . . . ,

xn V are said to be linearly dependent if there exist scalars 1, 2, . . . ,

n F, not all of which are zero, such that

n

i=1 ixi

=0. (2)

The vectors that are not linearly dependent are linearly independent.

Remark. x1, x2, . . . , xn V are linearly dependent implies that every xi

in the set {x1, x2, . . . , xn} can be expressed as the linear combination of

the remaining vectors in the set.

C.P. Kwong


4/27

1 VECTOR SPACES 4

Given the following two vectors on a plane:

12

It seems that any vector x on this plane can be written as x = 11 + 22

for some 1, 2 R. For example, in the following diagram, x = 0.71

0.52.

12 0.71

0.52

x

However, the following 1, 2 cannot perform the same function:

1

2

Definition 4. A basis in a vector space V is a set of linearly independent

vectors such that every vector in V is a linear combination of this set of

vectors. The number of vectors that constitutes a basis is called the di-

mension ofV, denoted dim V. If dim V is finite, V is a finite-dimensionalvector space.

Definition 5. A set of vectors {i} is said to span a vector space V if every

vector in V can be written as a linear combination of{i}. (Note that {i}

may not be linearly independent and hence may not be a basis.)

C.P. Kwong


5/27

1 VECTOR SPACES 5

Given a basis {1, 2, . . . , n} for an n-dim vector space V, a vector

x V can be written as

x = 11 + 22 + + nn, i F (3)

[1 2 n] is called the coordinate vector of x in the basis formed

by {1, 2, . . . , n}.

A vector is "free", "floating", if no basis is specified:

x

The vector is "fixed" whenever a basis is given:

x

1

2

It is obvious that there exist many bases for a given vector space and

different coordinator vectors result from different bases for the same

x. What are the relationships between these coordinator vectors? The

answer is: Two coordinator vectors are related by a coordinate transfor-

mation effected by an n n matrix with elements in F.

Let {1, 2, . . . , n} be a set of basis for V and {1,

2, . . . ,

n} be an-

other set of basis for the same V. Thus a vector x V can be written

either as

x = 11 + 22 + + nn, i F , (4)

or

x = 11 +

2

2 + +

n

n,

i F . (5)

C.P. Kwong


6/27

1 VECTOR SPACES 6

Since 1, 2, . . . ,

n are vectors, we can write

1 = a111 + a212 + + an1n,

...

n = a1n1 + a2n2 + + annn,

aij F , (6)

i.e.,

[1 n] = [1 n]

a11 a12 a1n

a21 a22 a2n...

an1 an2 ann

= [1 n]A,

(7)

where A is an n n matrix of scalars in F. However, since

x = [1 n]

1...

n

= [1 n]

1...

n

, (8)

therefore

[1 n]A

1...

n

= [1 n]

1...

n

. (9)

This last equation holds for any set of basis vectors {1, 2, . . . , n} and

hence

1...

n

= A

1...

n

. (10)

Thus the matrix A acts as a coordinate transformation that relates the

coordinates in the two bases for the same vector.

Definition 6. A subset W of a vector space V is a subspace of V if for

every pair x and y of vectors contained in W, every linear combination

x + y, , F, is also contained in W.

C.P. Kwong


7/27

1 VECTOR SPACES 7

Example 4. In the following figure, W1 is a 1-dimensional subspace and

W2 is a 2-dimensional subspace, ofR3.

W1

W2

It can be shown using its definition, that a subspace is itself a vector

space. Moreover, any subspace must contain the zero vector since for

any x W, x x is by definition in W.

Theorem 1. Let V be a vector space over the field F. The intersection of

any collection of subspaces of V is a subspace of V.

Proof. Let {Wi} be a collection of subspaces of V and W =

i Wi their

intersection. Since each Wi is a subspace containing the zero vector 0,

0 W. Let x, y W and , F. By the definition of W, x, y Wi for

all i. Because each Wi is a subspace, (x + y) Wi for all i. Therefore

x + y is again in W and W is a subspace by definition.

Definition 7. Let U and W be subspaces of a vector space V over F. The

sum ofU and W, denoted by U+ W, is the subset of V considering of all

sums u + w with u U and w W.

Remark. It is easy to show that U + W is a subspace of V.

Definition 8. A vector space V is said to be the direct sum of two sub-

spaces U and W, written as

V = U W , (11)

C.P. Kwong


8/27

1 VECTOR SPACES 8

if each v V has a unique representation

v = u + w, u U , w W . (12)

Moreover, W is called the algebraic complement ofU in V and vice versa.

Example 5. In the following figure V = R2 and U, W1, and W2 are sub-

spaces of V.

W1 W2

U

R2

We have

V = U W1

= U W2,(13)

and W1 is the "orthogonal" complement of U.

Theorem 2. Let V be a vector space over F, and U and W be subspaces

ofV. IfU + W = V and U W = 0, then V is the direct sum of U and W.

Proof. Given any v V. Since U + W = V, there exist u U and w W

such that v = u + w . Suppose there exist u U and w W such that

v = u + w. Then

u + w = u + w.

It follows that

u u = w w.

But u u U and w w W and therefore u u = w w U W.

Since U W = 0, u u = 0 and w w = 0 and hence u = u and

w = w. That means the representation v = u + w is unique. The

theorem is proved.

C.P. Kwong


9/27

2 MAPPINGS 9

Theorem 3. Let V be a finite-dimensional vector space over F. IfV is the

direct sum of U and W, then

dim V = dim U + dim W . (14)

Proof. Let {u1, u2, . . . , ur}be a basis ofU and {w1, w2, . . . , ws }be a basis

ofW. Then every u U has a unique representation

u = 1u1 + 2u2 + + rur, i F ,

and every w W has a unique representation

w = 1w1 + 2w2 + + s ws , i F .

Since V = U W, every v V has a unique representation

v = 1u1 + 2u2 + + rur + 1w1 + 2w2 + + s ws .

Therefore {u1, u2, . . . , ur, w1, w2, . . . , ws } is a basis ofV with dimension

r + s.

2 Mappings

Let X and Y be sets and A X a subset (ofX). A mapping T from A into

Y associates with each x A a single y Y called the image ofx under

T. We write y = T x. The set A is called the domain of definition of T,

or simply the domain of T, denoted by D(T ). Notationally, we write

T: D(T ) Y .

The range of T, denoted by R(T ), is the set of all images:

R(T ) = {y Y|y = T x for some x D(T )}.

Note that D(T ) is not necessarily the whole Xand R(T ) is not necessarily

the whole Y.

If R(T ) is the whole Y, then T is said to be onto, or surjective.

C.P. Kwong


10/27

3 LINEAR OPERATORS 10

There may be more than one element in D(T ) that is mapped to a

single element in R(T ):

x1

x2

y = T x1 = T x2

T

T

Ifx1 x2 implies T x1 T x2 for every x1, x2 D(T ), then T is called

one-to-one or injective.

Given a y R(T ), the inverse image ofy is the set of all x D(T )

such that T x = y. For an injective mapping T: D(T ) Y, we can define

a mapping T1 : R(T ) D(T ), called the inverse of T, such that y

R(T ) is mapped (by T1) to that x D(T ) for which T x = y. However,

an inverse mapping cannot be defined if T is not injective. (Why?)

3 Linear Operators

The following figure shows two functions y = f(x) = ax and y =

g(x) = bx2 where a and b are real constants:

g(x)

f(x)

y

x0

C.P. Kwong


11/27


Let x = 1. Then f(x) = f (1) = a and g(x) = g(1) = b. Next, let

x = 3. Then f(x) = f (3) = 3a and g(x) = g(3) = 9b. Suppose now

x = 1 + 3 = 4,

we have

f(x) = f (4) = 4a = f (1) + f (3).

However,

g(x) = g(4) = 16b g(1) + g(3).

The function f(x) is "linear" in this sense. "Linear operators" are map-

pings between vector spaces, which possess a similar linear property.

Definition 9. Let T be a mapping with domain D(T ) and range R(T ). T

is a linear operator ifD(T ) is a vector space over F and R(T ) is a subset

of a vector space also over F. Moreover, for any x, y D(T ) and any

F,

T (x + y) = T x + T y (15)

and

T(x) = Tx. (16)

Example 6. Let A be an m n matrix with elements ai,j F where F is

a field. The mapping defined by T x = Ax,x Fn and Ax is the usual

matrix multiplication, is a linear operator from Fn into Fm.

Example 7. Let x(t) C[a,b] be a continuous function from [a, b] into

R. Define a mapping T: C[a,b] Y as follow:

y(t) = T x =

ta

x( ) d , t [a, b].

From the theory of integration, y(t) is also a continuous function overt [a,b]. Moreover, for any x, y C[a,b], , F,t

a[x() + y()]d =

ta

x()d +

ta

y ( ) d , t [a, b].

Therefore T is a linear operator from C[a,b] into itself, i.e., T: C[a,b]

C[a,b].

C.P. Kwong


12/27


Definition 10. The null space of a linear operater T, denoted by N(T ),

is the set of all x D(T ) such that T x = 0, where 0 is the zero vector ofR(T ).

Theorem 4. Let T be a linear operator. Then R(T ) and N(T ) are vector

spaces.

Proof. Let y1, y2 be any two vectors in R(T ) and , be any two scalars

in F. Then there must exist two vectors x1 and x2 in D(T ) such that

T x1 = y1 and T x2 = y2. Since T is linear,

T(x1 + x2) = T x1 + T x2 = y1 + y2.

Therefore y1 + y2 is in R(T ). This shows that R(T ) is a vector space.

For any x1, x2 N(T ), T x1 = T x2 = 0 by definition. The linear

combination ofx1 and x2, being x1+x2, is also in N(T ) since T(x1 +

x2) = T x1 + T x2 = 0. This shows that N(T ) is a vector space.

Example 8. Given

2 21 1x

y = 0

0which leads to the simultaneous equations

2x 2y = 0;

x y = 0.

The solution gives the null space represented by x = y:

x = y

0

y

x

T

C.P. Kwong


13/27


Theorem 5. Let T be a linear operator. If D(T ) is finite-dimensional,

thendim D(T ) = dim R(T ) + dim N(T). (17)

Proof. Suppose N(T ) is k-dimensional. Then there are k vectors 1, 2, . . . , k

that form a basis for N(T ). Suppose dim D(T ) = n. Then there are lin-

early independent vectors k+1, k+2, . . . , n in D(T ) such that

{1, 2, . . . , k, k+1, k+2, . . . , n} is a basis for D(T ). We shall prove that

{T k+1, T k+2, . . . , T n} is a basis for R(T ).

The vectors T 1, T 2, . . . , T n certainly span R(T ), i.e., any vector in

R(T ) is a linear combination ofT i, i = 1, 2, . . . , n. However, since i, 1 i k, are in N(T ), T i = 0 for 1 i k. It follows that R(T ) is indeed

spanned by a smaller set of vectors T k+1, T k+2, . . . , T n. It remains to

show that they are linearly independent.

Suppose there are scalars i F such that

ni=k+1

i(T i) = 0,

i.e., T i, i = k + 1, k + 2, . . . , n, are linearly dependent. Since T is linear,

we have

Tn

i=k+1

ii = 0

and hence the vector x =n

i=k+1 ii is in N(T ). Since {1, 2, . . . , k} is

a basis for N(T ), there must exist scalars 1, 2, . . . , k such that

x =

ki=1

ii.

It follows thatk

i=1

ii

n

i=k+1

ii = 0.

Since 1, 2, . . . , n are linearly independent, we must have

1 = 2 = = k = k+1 = k+2 = n = 0.

This shows that T k+1, T k+2, . . . , T n are linearly independent and hence

form a basis for R(T ). Since dim N(T ) = k and dim D(T ) = n, the

theorem follows.

C.P. Kwong


14/27


The inverse of an operator is an important concept in both theory and

application. Suppose we are given y and T, we ask: what is x such thatT x = y?

T

T1

yx

Definition 11. The inverse of a linear operator T: D(T ) R(T ), if ex-

ists, is the mapping T1 : R(T ) D(T ) such that for every y R(T ),

T1y = x, where x D(T ) and T x = y.

The following is an existence theorem for inverse.

Theorem 6. Given a linear operator T. T1 : R(T ) D(T ) exists if and

only if

T x = 0 =

impliesx = 0. (18)

Moreover, T1 is linear.

Proof. It is easy to see that T1 exists if T is one-to-one, i.e., for any

x1, x2 D(T ),

T x1 = T x2 = x1 = x2.

We first prove the "if" part, i.e.,

T x = 0 = x = 0 = T1 exists .

Suppose there are x1 and x2 such that T x1 = T x2. Since T is linear,

T (x1 x2) = T x1 T x2 = 0.

But (x1 x2) D(T ) and T (x1 x2) = 0 implies x1 x2 = 0 (i.e., the

assumption) or x1 = x2. Therefore T x1 = T x2 implies x1 = x2 and T1

exists.

C.P. Kwong


15/27


Next, we prove the "only if" part, i.e.,

T x = 0 = x = 0 = T1 exists .

Let x1 be the zero vector in D(T ). Since T is linear, T0 = T (x + (x)) =

T x T x = 0 where x D(T ). It follows that T x1 = 0. Suppose there is

another vector x2 0 such that T x2 = 0. Then we can write

T x1 = T x2 = 0.

However, if T1 exists, then, for any x1, x2 D(T ), T x1 = T x2 implies

x1 = x2. Consequently, x1 = x2 = 0, i.e., the zero vector is the uniquevector x such that T x = 0.

Finally, let x1, x2 D(T ) and write y1 = T x1 and y2 = T x2. Then

y1, y2 R(T ). IfT1 exists, we have x1 = T

1y1 and x2 = T1y2. Since

T is linear, for any , F,

y1 + y2 = T x1 + T x2 = T(x1 + x2).

Thus

T1(y1 + y2) = T1T(x1 + x2) = x1 + x2,

or

T1(y1 + y2) = T1y1 + T

1y2.

This proves that T1 is linear.

Corollary 1. If D(T ) is finite-dimensional and T1 exists, dim R(T ) =

dim D(T ).

Proof. We have proved that

dim D(T ) = dim R(T ) + dim N(T )

and that T1 exists implies

T x = 0 = x = 0.

Therefore N(T ) contains only the zero vector and hence dimN(T ) = 0.

The corollary follows.

C.P. Kwong


16/27


We show in the following that there is a matrix associated with a linear

operator, which depends on the choice of bases of D(T ) and R(T ).Let T: D(T ) R(T ) be a linear operator. Suppose {1, 2, . . . , n} is

a basis for D(T ) and {1, 2, . . . , m} is a basis of R(T ). Then a vector

x D(T ) can be expressed as

x =

ni=1

ii, i F . (19)

Note that [1, 2, . . . , n] is the coordinate vector ofx relative to the basis

{1, 2 . . . , n}. Since T is linear, we have

T x =

ni=1

iT i, i F . (20)

Let y = T x. Clearly y R(T ) and we can write

y =

mi=1

ii, i F (21)

where [1, 2, . . . , m] is the coordinate vector ofy relative to the basis

{1, 2, . . . , m}. A matrix A arises naturally as representing the opera-

tion of T on the vectors 1, 2, . . . , n:

T 1 = a111 + a212 + + am1m,...

T n = a1n1 + a2n2 + + amnm,

aij F , (22)

or

[T 1 T 2 T n] = [1 2 m]

a11 a12 a1n

a21 a22 a2n...

am1 am2 amn

.

A

(23)

Since y = T x, we have

[1 2 m]

1

2...

m

= [T 1 T 2 T n]

1

2...

n

. (24)

C.P. Kwong


17/27


Therefore

[1 2 m]

1

2...

m

= [1 2 m]A

1

2...

n

, (25)

which gives

1

2...

m

= A

1

2...

n

. (26)

We see that multiplying the coordinator vector of x D(T ) by A gives

the coordinator vector of y = T x in R(T ). A is called the matrix repre-

sentation ofT. Note however that A depends on the choice of the bases

{i} and {i}.

Example 9. A linear operator T effects the following mapping of two

vectors in R2:

T

T

x1x2

(3, 6)

(2, 1)(2, 1)

(1, 4)

We have, in the basis formed by (1, 0) and (0, 1),

x1 = 2(1, 0) + 1(0, 1).

C.P. Kwong


18/27


Similarly,

T x1 = 1(1, 0) 4(0, 1)

in the same basis. We then have, for the mapping T x1,a11 a12

a21 a22

2

1

=

1

4

,

and for the mapping T x2,

a11 a12

a21 a22

2

1

=

3

6

.

Hence

2a11 + a12 = 1,

2a21 + a22 = 4,

2a11 + a12 = 3,

2a21 + a22 = 6.

Solving gives

A = a11 a12

a21 a22 = 1 1

2.5 1 .

Suppose a new basis is chosen and the matrix which relates the coor-

dinate vectors in the old and the new bases is

P =

2 1

1 1

.

Thus the new coordinate vectors for

2

1

and

1

4

are

P

2

1

=

3

1

and P

1

4

=

2

3

.

We ask what is the matrix B such that

B

3

1

=

2

3

,

C.P. Kwong


19/27


i.e., what is the matrix representation of the linear operator T in the new

basis?

Since P

2

1

=

3

1

and P must be nonsingular (why?) and hence

P1 exists, we have

P1P

2

1

= P1

3

1

=

2

1

= P1

3

1

.

It follows that

A2

1 = AP1

3

1 . (27)Similarly,

P1

2

3

=

1

4

. (28)

But

A

2

1

=

1

4

and then (27) can be rewritten as

AP13

1

= 14

. (29)Substituting (28) into (29) gives

AP1

3

1

= P1

2

3

,

or

P AP1 3

1 =

2

3 .

Therefore, we obtain

B = P AP1 =

2 1

1 1

1 1

2.5 1

1 1

1 2

=

1.5 2.5

1.5 1.5

.

C.P. Kwong


20/27


It is easy to check that 1.5 2.5

1.5 1.5

3

1

=

2

3

.

Example 10. Let T: D(T ) R(T ) be a linear operator where D(T ) and

R(T ) are in the same vector space X over F. An eigenvalue of T is a

scalar F such that there is a nonzero vector g D(T ) with T g = g;

g is called the eigenvector ofT associated with .

Suppose dim D(T ) = n is finite and suppose we can find n linearlyindependent eigenvectors of T, g1, g2, . . . , gn. Then {gi} is a basis for

D(T ) and every x D(T ) can be written as

x =

ni=1

igi, i F .

Then

T x =

ni=1

iT gi =

ni=1

iigi =

ni=1

igi

where i = ii F. That means, relative to the same basis {gi}, the

coordinator vector of T x, [1 2 n], is related to the coordinator

vector of x, [1 2 n], by

1

2...

n

= A

1

2...

n

,

where A has a simple form in this basis of eigenvectors:

A =

1 0. . .

0 n

.

C.P. Kwong


21/27

4 SYMMETRIC MATRICES AND QUADRATIC FORMS 21

4 Symmetric Matrices and Quadratic Forms

The transpose of a matrix A, denoted as AT, is obtained by taking as its

ith column the ith row of A.

Example 11.

1. A1 =

1 2 3

1 0 4

, AT1 =

1 1

2 0

3 4

.

2. A2 =1 1

4 2

, AT2 = 1 41 2

.

Let A1 and A2 be given by the above two examples. We construct

B1 = A1AT1 =

1 2 3

1 0 4

1 1

2 0

3 4

=

14 11

11 17

,

B2 = AT1A1 =

1 1

2 0

3 4

1 2 31 0 4

= 2 2 1

2 4 6

1 6 25

,

B3 = A2AT2 =

2 2

2 20

,

B4 = AT2A2 =

17 7

7 5

.

We observe two interesting properties of all matrices B1 to B4. First, they

are all square. Second,

BT1 = B1, BT2 = B2, B

T3 = B3, B

T4 = B4.

In other words, every one of these matrices equal to its respective trans-

pose. We call this kind of matrices symmetric.

The following properties of a transpose are easy to prove:

C.P. Kwong


22/27


1. (AT)T = A;

2. (kA)T = kAT, k is a constant;

3. (A + B)T = AT + BT;

4. (AB)T = BTAT;

5. (A1)T = (AT)1.

We prove in the following the last property.

Since AA1 = I and A1A = I where I is the identity matrix, we have

(AA1)T = (A1)TAT = IT = I

and

(A1A)T = AT(A1)T = IT = I.

That means (A1)T is the inverse of AT.

Now it is easy to see why AAT and ATA are necessarily symmetric,

because

(AAT)T = (AT)TAT = AAT,

(ATA)T = AT(AT)T = ATA.

A symmetric matrix is necessarily square since the requirement AT =

A implies that A and AT must have the same dimension. However, AT

and A can have the same dimension only if A is square. It follows that

we can talk about A1 if A is symmetric. The inverse may not exist. For

example,

A =

1 2

2 4

is symmetric but it has no inverse (since det A = 0). However, if theinverse of a symmetric matrix exists, it must also be symmetric. This is

because (A1)T = (AT)1 and ifA is symmetric, AT = A, or (A1)T = A1.

C.P. Kwong


23/27


Let x Rn1 be a column vector and A Rnn be a symmetric matrix,

then xTAx is a scalar q given by

q = xTAx =

x1 x2 xn

a11 a12 a1n

a21 a22 a2n...

...

an1 an2 ann

x1

x2...

xn

= a11x21 + a12x1x2 + a21x2x1 + + annx

2n

=

n

i=1n

j=1aijxixj .

(30)

We call q = xTAx a quadratic form.

Example 12.

q =

x1 x2 1 2

2 2

x1

x2

= x21 4x1x2 + 2x22 .

Notice that, for x 0, q can be positive (e.g., when x =

1

1

) ornegative

(e.g., when x =

1

1

).

A symmetric matrix A is positive definite if

x 0 = xTAx > 0, (31)

and positive semidefinite if

x 0 = xTAx 0. (32)

A very broad class of matrices are automatically positive definite.

Theorem 7. Let B be an m n matrix and let A be given by

A = BTB.

Then A is positive semidefinite. If furthermore the n columns of B are

linearly independent, then A is positive definite.

C.P. Kwong


24/27

5 SOLUTION OF ALGEBRAIC EQUATIONS 24

Proof. Let x Rn1 be nonzero and y = Bx . Thus y is an m 1 vector.

We have shown before that BTB is symmetric. Furthermore,

xTAx = xTBTBx = yTy =

mi=1

y2i 0.

Hence A is positive semidefinite. Since y is the linear combination of the

n columns ofB with x1, x2, . . . , xn as scalar multipliers, if the n columns

ofB are linearly independent, y = Bx = 0 if and only ifx = 0. Now x 0

implies y 0. Therefore xTAx = yTy > 0, and A is positive definite.

We learned that g 0 is an eigenvector of A associated with the

eigenvalue if Ag = g.

Lemma 1. Let be an eigenvalue of A and g its associated eigenvector.

Then

=gTAg

gTg. (33)

Proof. We have Ag = g, g 0, and gTAg = gTg. This gives

=gTAg

gTg.

Theorem 8. The eigenvalues of a positive definite matrix are positive and

the eigenvalues of a positive semidefinite matrix are nonnegative.

Proof. Since =gTAg

gTgand gTg > 0, if gTAg > 0, > 0. Similarly, 0

if gTAg 0. Note that gTg > 0 is true because we can prove that the

eigenvectors of a symmetric matrix are all real.

5 Solution of Algebraic Equations

The solution of the system of algebraic equations

a11x1 + a12x2 + + a1nxn = y1,

a21x1 + a22x2 + + a2nxn = y2,...

am1x1 + am2x2 + + amnxn = ym

(34)

C.P. Kwong


25/27


is identical to the solution of the matrix equation

Ax = y, (35)

where

A =

a11 a12 a1n

a21 a22 a2n...

am1 am2 amn

is a given m n matrix, y = y1 y2 ymT

is a given vector, and

x =

x1 x2 xnT is an unknown to be found.

The solution of (35) is closely related to the "rank" of the matrix A,

which is defined as follows.

Definition 12. For an m n matrix A, its m rows generate a subspace

and the dimension of this subspace is called the row rank ofA. Similarly,

the dimension of the subspace generated by the columns of A is called

the column rank ofA.

It can be shown that the row rank and the column rank of a matrix isidentical. Thus we can simply say A has the rank r, or write rank(A) = r.

Clearly r min(m,n). When r = min(m, n), A is said to have full rank.

Example 13. For the matrix

A =

1 1 1 1

1 2 3 4

2 4 6 8

,

rank(A) = 2. This is because the number of independent rows is two.

Returning to the solution of (35). If m = n and A is of full rank,

det A 0. It follows that A1 exists and A1A = AA1 = I, I is the

identity matrix. In this case, x is simply given by

x = A1y (36)

C.P. Kwong


26/27


since, multiplying both sides of (35) by A1 gives

A1Ax = A1y, (37)

of which the left-hand side is nothing but Ix = x.

When A is not of full rank, or m n, the solution of (35) is not so

simple. First, consider its existence.

Theorem 9. The matrix equation (35) has a solution if and only if

rank([A|y]) = rank(A)

where [A|y] is the augmented matrix formed by A and y.

Proof. Suppose there exists an x such that Ax = y. Then y is the linear

combination of the columns ofA and hence lies in Im A (the image ofA).

It follows that Im [A|y] = Im A. Since the dimension of Im A (which is a

vector space) equals to the rank of A, we have rank([A|y]) = rank(A).

Conversely, we note that Im A Im [A|y]. Hence if rank([A|y]) =

rank(A), Im A and Im [A|y] have the same dimension and are therefore

equal, i.e., they are the same vector space. Hence y Im A, i.e., y is the

linear combination of the columns of A. That means y = Ax for some

x Rn.

Corollary 2. If rank(A) = m, then (35) always has a solution.

Proof. Since the columns of A and [A|y] are m-dimensional vectors,

rank(A) rank([A|y]) m. If rank(A) = m, then m = rank(A)

rank([A|y]) m implies rank(A) = rank([A|y]). Hence (35) has a solu-

tion.

The following theorem answers the question ofuniquenessof a solu-

tion.Theorem 10. Let x0 be a solution of (35). Then the set of all solutions is

x0 + N(A)

where N(A) is the null space of A.

e.g., A =

1 2

3 4

, y =

5

6

, then [A|y] =

1 2 5

3 4 6

.

C.P. Kwong


27/27


Proof. Ifu is a solution of (35), then Au = Ax0 or A(ux0) = 0, where 0 is

the zero vector. Hence (ux0) N(A) and u = x0+(ux0) x0+N(A).Conversely ifu x0 +N(A), then u = x0 +z for some z satisfying Az = 0.

Hence Au = Ax0 + Az = Ax0 = y.

Thus, given a solution of (35), adding to this solution any vector in

the null space ofA results in another solution. This proves the following

result.

Corollary 3. A solution of (35) is unique if and only if N(A) contains

only the zero vector.

The homogeneous equation Ax = 0 always has the trivial solution

x = 0. From the uniqueness Theorem 10, the set of all solutions is the

subspace 0 + N(A) = N(A). Thus Ax = 0 has a nontrivial solution if

and only if N(A) is non-empty, the latter is true if n > m (prove it!). It

follows that Ax = 0 always has a nontrival solution if n > m.

01 linear algebra

Documents