review of basic linear algebra

ESI 6314 Deterministic Methodsin Operations Research

Review of Basic Linear Algebra

Solving linear programs will require repeatedly solving systems of linearequations. We consider a system with m linear equations, and n variables,or unknowns. We let xi denote the ith variable, with aij corresponding tothe coefficient of xi in the jth equation and bj denoting the right-hand sidevalue of the jth equation, i.e., we consider the following system:

a11x1 + a12x2 + · · ·+ a1nxn = b1

a21x1 + a22x2 + · · ·+ a2nxn = b2...

...

am1x1 + am2x2 + · · ·+ amnxn = bm

It will be convenient to use vector and matrix notation, where

A =

a11 a12 . . . a1na21 a22 . . . a2n...

......

am1 am2 . . . amn

,and

x =

x1x2...xn

, b =

b1b2...bn

.We say that A is an m × n matrix and that x and b are n-dimensionalvectors. An n-dimensional vector is a point in n-space and corresponds to amatrix with either one row or one column. We have written x and b aboveas column vectors, i.e., matrices containing a single column. We could alsowrite x, for example, as a row vector. When we define x as a column vectorand then write it as a row vector, we call this row vector the transpose ofx, i.e., xT = [x1 x2 · · ·xn]. If we had originally defined x as a row vector,then its transpose would correspond to a column vector. The transpose of

1

the m× n matrix A above is an n×m matrix AT written as

AT =

a11 a21 . . . am1

a12 a22 . . . am2...

......

a1n a2n . . . amn

.Note that (AT)T = A.

Vector and matrix addition We can add matrices or vectors of thesame dimensions. For example, consider the n-dimensional vectors u and v:

u =

u1u2...un

, v =

v1v2...vn

.Then the addition of u and v simply involves the component-wise additionof the vector elements:

u+ v =

u1 + v1u2 + v2

...un + vn

.In two dimensions, we can visualize this addition as shown in Figure 1.Similarly, we can add two m× n matrices A1 and A2, with the result beingan m× n matrix such that the (i, j)th element equals a1ij + a2ij .

Scalar multiple of a vector or matrix If k is a scalar (a number), thenthe product of k and the n-vector u is given by

ku =

ku1ku2

...kun

.Figure 2 shows a positive scalar multiple of the vector u; the result is avector in the same direction as u, but scaled by the multiple 2 in this case.The figure also shows a negative scalar multiple of the vector v; the result

2

Figure 1: Addition of two vectors u and v.

is a vector in the opposite direction of v, scaled by the multiple 1/2 in thiscase. Given a vector, it should be clear that a scalar multiple of that vectorcan represent any point on the infinite line that overlaps the vector arrow.Similarly, the product of k and the matrix A is written as

Figure 2: Scalar multiples of the vectors u and v.

kA =

ka11 ka12 . . . ka1nka21 ka22 . . . ka2n

......

...kam1 kam2 . . . kamn

.

3

Scalar or dot product of vectors The scalar product of two n-vectorsu and v, written as uT · v (assuming u and v are both column vectors), issimply the sum of the componentwise products of the vectors, i.e., u1v1 +u2v2 + · · · + unvn. Note that if u and v are both n-dimensional columnvectors, then u · v is undefined. The inner dimensions must be the same,and the result has dimensions equal to the outer dimensions. That is, uT · vinvolves the product of a (1 × n) vector and an (n × 1) vector, and this isallowed because both inner dimensions equal n, with the result being (1×1)(which is a scalar). On the other hand, u · v tries to take the product of an(n× 1) vector and an (n× 1) vector, which is not defined because the innerdimensions are 1 and n, respectively.

The magnitude of an n-dimensional vector u, denoted by ||u||, is theEuclidean norm, i.e., ‖u‖ =

√u21 + u22 + · · ·+ u2n. It turns out that uT · v =

‖u‖‖v‖ cos θ, where θ is the measure of the angle between u and v. Thus,the scalar product of vectors that are perpendicular to one another equalszero.

We can view the vectors u and v as arrows (as in Figure 1) or as thepoints at the end of these respective arrows. Figure 3 shows that any pointon the line segment joining these points can be expressed as λu + (1 − λ)vfor some 0 ≤ λ ≤ 1. For a given 0 ≤ λ ≤ 1 we call λu + (1 − λ)v a convex

Figure 3: Convex combinations of two vectors u and v (0 ≤ λ ≤ 1).

combination of the vectors u and v. Then the line segment connecting uand v represents all convex combinations of u and v.

Matrix multiplication We can mutiply an m × n matrix A by somematrix D, provided that D has dimensions n×p for some positive integer p.

4

We write this multiplication as AD, and the result is a matrix of dimensionsm × p. (Note that DA 6= AD in general, and DA is only defined in thisexample if p = m.) Element (i, j) of AD is equal to the scalar product ofthe ith row of A and the jth column of D.

Example Given the matrices A and D below, compute AD.

A =

[1 1 22 1 3

], D =

1 12 31 2

.Because A is 2× 3 and D is 3× 2, this product is defined and the result

will be a 2 × 2 matrix. Element (1, 1) of this matrix is obtained by takingthe scalar product of row 1 of A and column 1 of D, i.e.,

[1 1 2

] 121

= 5.

The resulting matrix is

AD =

[5 87 11

].

Matrix multiplication is associative and distributive, i.e., assuming compat-ible dimensions for multiplication of matrices A, B, C, and D, A(BC) =(AB)C, A(B+C) = AB+AC, and (B+C)D = BD+CD. If the matricesA and D are compatible for multiplication, then (AD)T = DTAT.

Systems of Equations

We now return to the system of equations we started with, now writing thissystem in matrix form, i.e.,

Ax = b,

where A is an m× n matrix and x and b are n-dimensional column vectors(note that Ax takes the product of an m × n matrix and an n × 1 vector,and the result is thus an m × 1 vector). We do not know a priori whetherthe above system of equations admits a feasible solution. However, exactlyone of three situations must occur:

1. The system contains no feasible solution.

2. The system contains exactly one solution.

5

3. The system contains an infinite number of solutions.

Given a system of equations, we would like to be able to characterize which ofthe above conditions applies. Doing so requires understanding the conceptsof linearly independent vectors and the rank of a matrix.

It is perhaps useful to think of our system of equations in a slightlydifferent way. We can rewrite our system of equations as follows:

n∑i=1

a1ia2i...ami

xi =

b1b2...bm

.On the left-hand side, the vector ai = [a1i a2i · · · ami]

T corresponds to theith column of the A matrix. When viewed this way, it should be clear that insolving this system of equations, we are trying to express the m-dimensionalvector b as a linear combination of the n columns of the A matrix, withmultipliers x1, x2, . . . , xn. Thus, we would like to know when it is possibleto express an m-dimensional vector as a linear combination of a set of nvectors in m dimenions.

To understand this, we need the concept of linear independence. A setof n vectors in m-space is linearly independent if the only solution to thesystem

∑ni=1 a

iki = 0 requires setting ki = 0 for i = 1, . . . , n. The vectorsu and v in each of the figures we have used so far are linearly independentbecause the only way to obtain the origin as a linear combination of thesevectors requires multiplying each vector by zero. If a non-zero set of ki’sexists such that the associated linear combination of the vectors equals zero,then the vectors are not linearly independent.

A set of vectors such that a linear combination of these vectors canrepresent any point in m-space is said to span m-space. We require at leastm vectors in order to span m-space, and some subset of m of these vectorsmust be linearly independent in order for the vectors to span m-space. Aset of m vectors that span m-space is called a basis for m-space, and this setof m vector must be linearly independent by definition (moreover, removingany one of these vectors from the set will result in a set of vectors that nolonger span m-space). Given a basis for m-space, the representation of anypoint in m-space in terms of this basis is unique. That is, if we have mlinearly independent columns of A (e.g., columns 1 through m), then thesolution to

∑mi=1 a

ixi = b is unique. Note that a set of more than m vectorsmay span m-space, provided that a subset of m of these vectors is linearly

6

independent. As a result, we immediately know that if a subset of the ncolumn vectors a1, a2, . . . , an spans m-space, then our system of equationshas at least one solution, as the m-dimensional vector b can be expressed asa linear combination of these vectors.

Observe that if n < m, then we cannot have a subset of m columns thatspan m-space. However, this does not immediately imply that our systemof equations has no solution. It is entirely possible that one or more of theequations in our system is a linear multiple of another equation, implyingthat we have redundant, or unnecessary equations, and should be workingin a smaller subspace of m.

In order to characterize which of the three situations applies (one solu-tion, no solution, infinite number of solutions) for a given system of equationsAx = b, it is useful to understand the concept of the rank of a matrix. Therank of a matrix A corresponds to the maximum number of linearly inde-pendent columns (or rows) of the matrix. Given an m × n matrix A, thisimmediately implies that Rank A ≤ min{m,n}. The rank of a matrix willtell us the largest subspace spanned by the set of columns of the matrix. Inorder to identify the rank of a matrix, we can use a set of elementary rowoperations (ERO’s). The following are ERO’s:

1. Interchanging rows i and j.

2. Multiplying a row by a nonzero scalar.

3. Multiplying a row by a scalar and adding it to another row.

The application of any set of ERO’s to a system of equations does notaffect the set of solutions for the system.

If an m× n matrix A has rank k ≤ min{m,n}, then it is possible to usea set of ERO’s to obtain a k × k identity matrix Ik, where

Ik =

1 0 . . . 00 1 . . . 0...

......

0 0 . . . 1

.(Note that if A is m × n, then ImA = AIn = A; generally we will write Iwhen the dimension is understood or implied.) Thus, in order to identifythe rank of a matrix A, we perform a set of ERO’s to obtain the largestpossible identity matrix. To illustrate this, consider the 3× 4 matrix below,

7

followed by a series of ERO’s applied to the original matrix:

A =

1 2 3 41 3 5 60 1 2 3

.

A1 =

1 2 3 40 1 2 20 1 2 3

.A2 =

1 2 3 40 1 2 20 0 0 1

.A3 =

1 0 −1 00 1 2 20 0 0 1

.A4 =

1 0 −1 00 1 2 00 0 0 1

.Because we have found a 3×3 identity matrix and Rank A ≤ min{3, 4} = 3,we know that this matrix has rank 3. Clearly the first, second, and fourthcolumns are linearly independent and thus span and provide a basis for3-space. Next, consider a slightly different A matrix:

A =

1 2 3 41 3 4 60 1 1 2

.

A1 =

1 2 3 40 1 1 20 1 1 2

.A2 =

1 2 3 40 1 1 20 0 0 0

.A3 =

1 0 1 00 1 1 20 0 0 0

.Because the largest identity matrix we can find is 2×2, this matrix has rank2.

8

Determining a solution to Ax = b (or showing that none exists) canbe done through a series of ERO’s applied to the extended matrix [A|b].Consider the following system of equations:

x1 + 2x2 + 3x3 + 4x4 = 10

x1 + 3x2 + 6x3 + 6x4 = 24

x2 + 2x3 + 3x4 = 20

The extended matrix [A|b] can be represented as 1 2 3 41 3 5 60 1 2 3

∣∣∣∣∣∣102420

.We next perform a series of ERO’s: 1 2 3 4

0 1 2 20 1 2 3

∣∣∣∣∣∣101420

. 1 2 3 4

0 1 2 20 0 0 1

∣∣∣∣∣∣10146

. 1 0 −1 0

0 1 2 20 0 0 1

∣∣∣∣∣∣−18146

. 1 0 −1 0

0 1 2 00 0 0 1

∣∣∣∣∣∣−18

26

.In this system, we have Rank [A|b] = Rank A = 3 < n = 4. When thisis the case, i.e., Rank [A|b] = Rank A < n, we have an infinite number ofsolutions. One such solution is x1 = −18, x2 = 2, x3 = 0, x4 = 6. Anothersolution is x1 = −15, x2 = −3, x3 = 1, x4 = 7.

If Rank [A|b] = Rank A = n, note that this can only occur if n ≤ m,i.e., if the number of equations in the system is at least as great as thenumber of variables. If strict inequality holds, then a solution can only existif m − n equations are redundant. When these are removed, we are leftwith an n× n system of equations such that Rank A′ = n (where A′ is theremaining matrix after removing the redundant equations), in which case

9

the columns of A′ serve as a basis for n-space, and the solution must beunique.

Finally, suppose Rank [A|b] > Rank A = k. In this case, the columns ofA span k-space, but the vector b is in a higher dimension than k. As a resultwe cannot express b as a linear combination of the columns of A, and nosolution exists. The following example illustrates how we can identify this:

x1 + 2x2 + 3x3 + 4x4 = 10

x1 + 3x2 + 4x3 + 6x4 = 24

x2 + x3 + 2x4 = 20

The extended matrix [A|b] can be represented as 1 2 3 41 3 4 60 1 1 2

∣∣∣∣∣∣102420


0 1 1 20 1 1 2

∣∣∣∣∣∣101420


0 1 1 20 0 0 0

∣∣∣∣∣∣10146

. 1 0 1 0

0 1 1 20 0 0 0

∣∣∣∣∣∣−18146

. 1 0 1 0

0 1 1 20 0 0 0

∣∣∣∣∣∣−18146

.At this point, we can see that Rank A = 2. If we can obtain a [0 0 1]T inthe column associated with b using a series of ERO’s, this will imply thatRank [A|b] = 3 > Rank A. We continue our ERO’s: 1 0 1 0

0 1 1 20 0 0 0

∣∣∣∣∣∣−18141

.10

1 0 1 00 1 1 20 0 0 0

∣∣∣∣∣∣−18

01

. 1 0 1 0

0 1 1 20 0 0 0

∣∣∣∣∣∣001

.In fact, whenever we see all zeroes in the same row of the columns associatedwith the A matrix and a non-zero in the same row of the column associatedwith the b vector, we can conclude that we have an infeasible system. Tosummarize, we have the following result:

1. If Rank [A|b] = Rank A < n, then the system has an infinite numberof solutions.

2. If Rank [A|b] = Rank A = n, then the system has a single uniquesolution.

3. If Rank [A|b] > Rank A, then the system has no solution.

The use of the extended matrix [A|b] and ERO’s to obtain a solution to asystem of equations is known as Gauss-Jordan Elimination.

Bases For convenience, we will henceforth assume that the system we areworking with has m equations and n ≥ m variables, with Rank A = m.We say that the A matrix in such cases has full row rank. Recall thatany choice of m linearly independent column vectors provides a basis form-space (and spans m-space). Let us suppose we have m such linearlyindependent columns, which form a basis for m-space, and let B denote thesquare submatrix associated with these columns. Then we know that if wethrow out all other variables (by equivalently setting their values to zero)and solve the system

Bx = b

we will obtain a unique solution (because Rank [B|b] = Rank B = m,and we effectively now have m equal to the number of variables). Thevariables corresponding to the m linearly independent columns selected arecalled basic variables, while those that were set to zero are called non-basicvariables. This unique solution is called a basic solution, and we can obtainthis solution by performing a set of ERO’s that leads to an identity matrixin place of B in the extended matrix [B|b]. Alternatively, we can identify arelated matrix, called the inverse of the matrix B. An m×m matrix with

11

full row rank (Rank m) has an associated inverse matrix, denoted as B−1

such that B−1B = I. Because of this, such a matrix is called non-singular.(Not every square m ×m matrix has an inverse; when such a matrix doesnot have full row rank, it is a singular matrix and does not have an inverse.)The inverse of a matrix is convenient to have, because we can premultiplyour system of equations by this inverse to obtain

B−1Bx = B−1b,

which is equivalent tox = B−1b.

That is, if we knew B−1, we could simply premultiply b by this matrix toobtain the unique basic solution. One way to identify B−1 is as follows. Wecreate an appended matrix [B|I] and perform a set of ERO’s until we obtainan identity matrix in place of the matrix B. What we obtain when we dothis, as a final result, is [I|B−1]. We illustrate this using the following Bmatrix:

B =

1 2 41 3 60 1 3

.The appended matrix is:

[B|I] =

1 2 41 3 60 1 3

∣∣∣∣∣∣1 0 00 1 00 0 1

.We next perform a series of ERO’s on the matrix [B|I]: 1 2 4

0 1 20 1 3

∣∣∣∣∣∣1 0 0−1 1 00 0 1

. 1 2 4

0 1 20 0 1

∣∣∣∣∣∣1 0 0−1 1 01 −1 1

. 1 0 0

0 1 20 0 1

∣∣∣∣∣∣3 −2 0−1 1 01 −1 1

. 1 0 0

0 1 00 0 1

∣∣∣∣∣∣3 −2 0−3 3 −21 −1 1

.12

We therefore have

B−1 =

3 −2 0−3 3 −21 −1 1

.The reader can verify that B−1B = I. Recall the system of equations

x1 + 2x2 + 3x3 + 4x4 = 10

x1 + 3x2 + 6x3 + 6x4 = 24

x2 + 2x3 + 3x4 = 20

Suppose we set x3 = 0. Then what remains is the system

x1 + 2x2 + 4x4 = 10

x1 + 3x2 + 6x4 = 24

x2 + 3x4 = 20

The A matrix for this system corresponds to the B matrix we just inverted.Thus, the unique solution can be obtained by premultiplying the right-handside vector b by B−1. When we do this we obtain: x1

x2x4

= B−1b =

3 −2 0−3 3 −21 −1 1

102420

=

−1826

.

13

review of basic linear algebra

Documents