logarithms square roots and real matrices

7/27/2019 Logarithms Square Roots and Real Matrices

1/41

arXiv:0805

.0245v1[math.GM

]2May2008

Logarithms and Square Roots of Real Matrices

Jean Gallier

Department of Computer and Information ScienceUniversity of Pennsylvania

Philadelphia, PA 19104, [email protected]

May 2, 2008

Abstract. In these notes, we consider the problem of finding the logarithm or the squareroot of a real matrix. It is known that for every real n n matrix, A, if no real eigenvalueofA is negative or zero, then A has a real logarithm, that is, there is a real matrix, X, suchthat eX = A. Furthermore, if the eigenvalues, , of X satisfy the property < () < ,then X is unique. It is also known that under the same condition every real nn matrix, A,has a real square root, that is, there is a real matrix, X, such that X2 = A. Moreover, if theeigenvalues, ei, ofX satisfy the condition

2< <

2, then X is unique. These theorems

are the theoretical basis for various numerical methods for exponentiating a matrix or for

computing its logarithm using a method known as scaling and squaring (resp. inverse scalingand squaring). Such methods play an important role in the log-Euclidean framework due toArsigny, Fillard, Pennec and Ayache and its applications to medical imaging. Actually, thereis a necessary and sufficient condition for a real matrix to have a real logarithm (or a realsquare root) but it is fairly subtle as it involves the parity of the number of Jordan blocksassociated with negative eigenvalues. As far as I know, with the exception of Highams recentbook [17], proofs of these results are scattered in the literature and it is not easy to locatethem. Moreover, Highams excellent book assumes a certain level of background in linearalgebra that readers interested in the topics of this paper may not possess so we feel thata more elementary presentation might be a valuable supplement to Higham [17]. In thesenotes, I present a unified exposition of these results and give more direct proofs of some ofthem using the Real Jordan Form.

1
http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1http://arxiv.org/abs/0805.0245v1


2/41

1 Jordan Decomposition and Jordan Normal Form

The proofs of the theorems stated in the abstract make heavy use of the Jordan normalform of a matrix and its cousin, the Jordan decomposition of a matrix into its semilinearpart and its nilpotent part. The purpose of this section is to review these concepts ratherthoroughly to make sure that the reader has the background necessary to understand theproofs in Section 2 and Section 3. We pay particular attention to the Real Jordan Form(Horn and Johnson [19], Chapter 3, Section 4, Theorem 3.4.5, Hirsh and Smale [18] Chapter6) which, although familiar to experts in linear algebra, is typically missing from standardalgebra books. We give a complete proof of the Real Jordan Form as such a proof does notseem to be easily found (even Horn and Johnson [19] only give a sketch of the proof but itis covered in Hirsh and Smale [18], Chapter 6).

Let V be a finite dimensional real vector space. Recall that we can form the complex-ification, VC, of V. The space VC is the complex vector space, V V, with the additionoperation given by

(u1, v1) + (u2, v2) = (u1 + u2, v1 + v2),

and the scalar multiplication given by

( + i) (u, v) = (u v,u + v) (, R).

Obviously(0, v) = i (v, 0),

so every vector, (u, v) VC, can written uniquely as

(u, v) = (u, 0) + i

(v, 0).

The map from V to VC given by u (u, 0) is obviously an injection and for notationalconvenience, we write (u, 0) as u, we suppress the symbol (dot) for scalar multiplicationand we write

(u, v) = u + iv, with u, v V.Observe that if (e1, . . . , en) is a basis of V, then it is also a basis of VC.

Every linear map, f: V V, yields a linear map, fC : VC VC, with

fC(u + iv) = f(u) + if(v), for all u, v V.

Definition 1.1 A linear map, f: V V, is semisimple ifffC can be diagonalized. In termsof matrices, a real matrix, A, is semisimple iff there are some matrices D and P with entriesin C, with P invertible and D a diagonal matrix, so that A = P DP1. We say that f isnilpotent iff fr = 0 for some positive integer, r, and a matrix, A, is nilpotent iff Ar = 0 forsome positive integer, r. We say that f is unipotent ifff id is nilpotent and a matrix A isunipotent iff A I is nilpotent.

2


3/41

If A is unipotent, then A = I + N where N is nilpotent. If r is the smallest integer sothat Nr = 0 (the index of nilpotency of N), then it is easy to check that

I N + N2 + + (1)r1Nr1

is the inverse of A = I + N.For example, rotation matrices are semisimple, although in general they cant be di-

agonalized over R, since their eigenvalues are complex numbers of the form ei. Everyupper-triangular matrix where all the diagonal entries are zero is nilpotent.

Definition 1.2 If f: V V is a linear map with V a finite vector space over R or C, aJordan decomposition of f is a pair of linear maps, fS, fN : V V, with fS semisimple andfN nilpotent, such that

f = fS + fN and fS fN = fN fS.

The theorem below is a very useful technical tool for dealing with the exponential map.It can be proved from the so-called primary decomposition theorem or from the Jordan form(see Hoffman and Kunze [21], Chapter 6, Section 4 or Bourbaki [7], Chapter VII, 5).Theorem 1.3 If V is a finite dimensional vector space over C, then every linear map,f: V V, has a unique Jordan decomposition, f = fS + fN. Furthermore, fS and fN canbe expressed as polynomials in f with no constant term.

Remark: In fact, Theorem 1.3 holds for any finite dimensional vector space over a perfectfield, K (this means that either K has characteristic zero of that Kp = K, where Kp =

{ap

| a K} and where p 2 is the characteristic of the field K). The proof of thisstronger version of Theorem 1.3 is more subtle and involves some elementary Galois theory(see Hoffman and Kunze [21], Chapter 7, Section 4 or, for maximum generality, Bourbaki[7], Chapter VII, 5).

We will need Theorem 1.3 in the case where V is a real vector space. In fact we need aslightly refined version of Theorem 1.3 for K = R known as the Real Jordan form. First, letus review Jordan matrices and real Jordan matrices.

Definition 1.4 A (complex) Jordan block is an r r matrix, Jr(), of the form

Jr() =

1 0

0

0 1 0... ... . . . . . . ...0 0 0

. . . 10 0 0

,

where C, with J1() = () if r = 1. A real Jordan block is either

3


4/41

(1) a Jordan block as above with R, or(2) a real 2r 2r matrix, J2r(, ), of the form

J2r(, ) =

L(, ) I 0 00 L(, ) I 0... ... . . . . . . ...0 0 0

. . . I0 0 0 L(, )

,

where L(, ) is a 2 2 matrix of the form

L(, ) =

,

with , R, = 0, with I the 2 2 identity matrix and with J2(, ) = L(, )when r = 1.

A (complex) Jordan matrix, J, is an n n block diagonal matrix of the form

J =

Jr1(1) 0... . . . ...0 Jrm(m)

,where each Jrk(k) is a (complex) Jordan block associated with some k C and withr1 + + rm = n. A real Jordan matrix, J, is an n n block diagonal matrix of the form

J = Js1(1) 0... . . . ...0 Jsm(m)

,where each Jsk(k) is a real Jordan block either associated with some k = k R as in (1)or associated with some k = (k, k) R2, with k = 0, as in (2), in which case sk = 2rk.

To simplify notation, we often write J() for Jr() (or J() for Js()). Here is anexample of a Jordan matrix with four blocks:

J =

1 0 0 0 0 0 00 1 0 0 0 0 00 0 0 0 0 0 00 0 0 1 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 10 0 0 0 0 0 0

.

4


5/41

In order to prove properties of the exponential of Jordan blocks, we need to understandthe deeper reasons for the existence of the Jordan form. For this, we review the notion of aminimal polynomial.

Recall that a polynomial, p(X), of degree n

1 is a monic polynomial iff the monomial

of highest degree in p(X) is of the form Xn (that is, the coefficient of Xn is equal to 1). Asusual, let C[X] be the ring of polynomials

p(X) = a0Xn + a1X

n1 + + an1X+ an,with complex coefficient, ai R, and let R[X] be the ring of polynomials with real coeffi-cients, ai R. If V is a finite dimensional complex vector space and f: V V is a givenlinear map, every polynomial

p(X) = a0Xn + a1X

n1 + + an1X+ an,yields the linear map denoted p(f), where

p(f)(v) = a0fn(v) + a1f

n1(v) + + an1f(v) + anv, for every v V ,and where fk = f f is the composition of f with itself k times. We also write

p(f) = a0fn + a1f

n1 + + an1f + anid. Do not confuse p(X) and p(f). The expression p(X) denotes a polynomial in the inde-

terminate X, whereas p(f) denotes a linear map from V to V.

For example, if p(X) is the polynomial

p(X) = X3 2X2 + 3X 1,if A is any n n matrix, then p(A) is the n n matrix

p(A) = A3 2A2 + 3A Iobtained by formally substituting the matrix A for the variable X.

Thus, we can define a scalar multiplication, : C[X] V V, byp(X) v = p(f)(v), v V.

We immediately check that

p(X) (u + v) = p(X) u +p(X) v(p(X) + q(X)) u = p(X) u + q(X) u

(p(X)q(X)) u = p(X) (q(X) u)1 u = u,

5


6/41

for all u, v V and all p(X), q(X) C[X], where 1 denotes the polynomial of degree 0 withconstant term 1.

It follows that the scalar multiplication, : C[X] V V, makes V into a C[X]-modulethat we will denote by Vf. Furthermore, as C is a subring of C[X] and as V is finite-

dimensional, V is finitely generated over C and so Vf is finitely generated as a module overC[X].

Now, because V is finite dimensional, we claim that there is some polynomial, p(X), thatannhilates Vf, that is, so that

q(f)(v) = 0, for all v V.

To prove this fact, observe that if V has dimension n, then the set of linear maps from V toV has dimension n2. Therefore any n2 + 1 linear maps must be linearly dependent, so

id, f , f 2, . . . , f n2

are linearly dependent linear maps and there is a nonzero polynomial, q(X), of degree atmost n2 so that q(f)(v) = 0 for all v V. (In fact, by the Cayley-Hamilton Theorem,the characteristic polynomial, qf(X) = det(Xid f), of f annihilates V, so there is someannihilating polynomial of degree at most n.) By abuse of language, if q(X) annihilates Vf,we also say that q(X) annhilates V.

Now, the set of annihilating polynomials of V forms a principlal ideal in C[X], whichmeans that there is a unique monic polynomial of minimal degree, pf, annihilating V andevery other polynomial annihilating V is a multiple of pf. We call this minimal monicpolynomial annihilating V the minimal polynomial of f.

The fact that V is annihilated by some polynomial in C[X] makes Vf a torsion C[X]-module. Furthermore, the ring C[X] has the property that every ideal is a principal idealdomain, abbreviated PID (this means that every ideal is generated by a single polynomialwhich can be chosen to be monic and of smallest degree). The ring R[X] is also a PID. Infact, the ring k[X] is a PID for any field, k. But then, we can apply some powerful resultsabout the structure of finitely generated torsion modules over a PID to Vf and obtain variousdecompositions ofV into subspaces which yield useful normal forms for f, in particular, theJordan form.

Let us give one more definition before stating our next important theorem: Say that V isa cyclic module iff V is generated by a single element as a C[X]-module, which means that

there is some u V so that u, f(u), f2(u), . . . , f k(u), . . . , generate V.Theorem 1.5 let V be a finite-dimensional complex vector space of dimensionn. For everylinear map, f: V V, there is a direct sum decomposition,

V = V1 V2 Vm,

6


7/41

where each Vi is a cyclicC[X]-module such that the minimal polynomial of the restrictionof f to Vi is of the form (X i)ri. Furthermore, the number, m, of subspaces Vi and theminimal polynomials of the Vi are uniquely determined by f and, for each such polynomial,(X )r, the number, mi, of Vis that have (X )r as minimal polynomial (that is, if = i and r = ri) is uniquely determined by f.

A proof of Theorem 1.5 can be found in M. Artin [5], Chapter 12, Section 7, Lang [23],Chapter XIV, Section 2, Dummit and Foote [13], Chapter 12, Section 1 and Section 3, orD. Serre [26], Chapter 6, Section 3. A very good exposition is also given in Gantmacher[14], Chapter VII, in particular, see Theorem 8 and Theorem 12. However, in Gantmacher,elementary divisors are defined in a rather cumbersone manner in terms of ratios of deter-minants of certain minors. This makes, at times, the proof unnecessarily hard to follow.

The minimal polynomials, (X i)ri, associated with the Vis are called the elementarydivisors off. They need not be distinct. To be more precise, if the set of distinct elementarydivisors of f is

{(X 1)r1, . . . , (X t)rt}then (X 1)r1 appears m1 1 times, (X 2)r2 appears m2 1 times, ..., (X t)rtappears mt 1 times, with

m1 + m2 + + mt = m.The number, mi, is called the multiplicity of (X i)ri. Furthermore, if (X i)ri and(X j)rj are two distinct elementary divisors, it is possible that ri = rj yet i = j .

Observe that (fiid)ri is nilpotent on Vi with index of nilpotency ri (which means that(fiid)ri = 0 on Vi but (fiid)ri1 = 0 on Vi). Also, note that the monomials, (Xi),are the irreducible factors of the minimal polynomial of f.

Next, let us take a closer look at the subspaces, Vi. It turns out that we can find a goodbasis ofVi so that in this basis, the restriction of f to Vi is a Jordan block.

Proposition 1.6 LetV be a finite-dimensional vector space and let f: V V be a linearmap. If V is a cyclicC[X]-module and if (X )n is the minimal polynomial of f, thenthere is a basis of V of the form

((f id)n1(u), (f id)n2(u), . . . , (f id)(u), u),for some u V. With respect to this basis, the matrix of f is the Jordan block

Jn() =

1 0

0

0 1 0...

.... . . . . .

...

0 0 0. . . 1

0 0 0

.

Consequently, is an eigenvalue of f.

7


8/41

Proof. Since V is a cyclic C[X]-module, there is some u V so that V is generated byu, f(u), f2(u), . . ., which means that every vector in V is of the form p(f)(u), for somepolynomial, p(X). We claim that u, f(u), . . . , f n2(u), fn1(u) generate V, which impliesthat

u, (f id)(u), . . . , (f id)n2

(u), (f id)n1

(u)generate V.

This is because if p(X) is any polynomial of degree at least n, then we can divide p(X)by (X )n obtaining

p = (X )nq+ r,where 0 deg(r) < n and as (X )n annihilates V, we get

p(f)(u) = r(f)(u),

which means that every vector of the form p(f)(u) with p(X) of degree n is actually alinear combination of u, f(u), . . . , f

n2

(u), fn1

(u). We can expand each (f id)k

usingthe binomial formula (because f commutes with itself and with id), so (f id)k(u) is alinear combination of u, f(u), . . . , f k(u) and

u, (f id)(u), . . . , (f id)n2(u), (f id)n1(u)generate V. Furthermore, we claim that the above vectors are linearly independent. Indeed,if we had a nontrivial linear combination

a0(f id)n1(u) + a1(f id)n2(u) + + an2(f id)(u) + an1u = 0,then the polynomial

a0(X )n1 + a1(X )n2 + + an2(X ) + an1of degree at most n 1 would annihilate V, contradicting the fact that (X )n is theminimal polynomial off (and thus, of smallest degree). Consequently,

((f id)n1(u), (f id)n2(u), . . . , (f id)(u), u),is a basis ofV and since u, f(u), . . . , f n2(u), fn1(u) span V,

(u, f(u), . . . , f n2(u), fn1(u))

is also a basis of V. Let us see how f acts on the basis

((f id)n1(u), (f id)n2(u), . . . , (f id)(u), u).

If we write f = f id + id, as (f id)n annihilates V, we getf((f id)n1(u)) = (f id)n(u) + (f id)n1(u) = (f id)n1(u)

8


9/41

andf((f id)k(u)) = (f id)k+1(u) + (f id)k(u), 0 k n 2.

But this means precisely that the matrix of f in this basis is the Jordan block Jn().

Using Theorem 1.5 and Proposition 1.6 we get the Jordan form for complex matrices.

Theorem 1.7 (Jordan Form) For every complex n n matrix, A, there is some invertiblematrix, P, and some Jordan matrix, J, so that

A = P JP1.

If {1, . . . , s} is the set of eigenvalues of A, then the diagonal elements of the Jordanblocks of J are among the i and every i corresponds to one of more Jordan blocks of J.Furthermore, the number, m, of Jordan blocks, the distinct Jordan block, Jri(i), and thenumber of times, mi, that each Jordan block, Jri(i), occurs are uniquely determined by A.

The number mi is called the multiplicity of the block Jri(i). Observe that the columnvector associated with the first entry of every Jordan block is an eigenvector ofA. Thus, thenumber, m, of Jordan blocks is the number of linearly independent eigenvectors of A.

Beside the references that we cited for the proof of Theorem 1.5, other proofs of Theorem1.7 can be found in the literature. Often, these proofs do not cover the uniqueness statement.For example, a nice proof is given in Godement [15], Chapter 35. Another interesting proofis given in Strang [27], Appendix B. A more computational proof is given in Horn andJohnson, [19], Chapter 3, Sections 1-4.

Observe that Theorem 1.7 implies that the charateristic polynomial, qf(X), of f is theproduct of the elementary divisors of f (counted with their multiplicity). But then, qf(X)

must annihilate V. Therefore, we obtain a quick proof of the Cayley Hamilton Theorem (ofcourse, we had to work hard to get Theorem 1.7!). Also, the minimal polynomial off is theleast common multiple (lcm) of the elementary divisors of f.

The following technical result will be needed for finding the logarithm of a real matrix:

Proposition 1.8 IfJ is a2n2n complex Jordan matrix consisting of two conjugate blocksJn( + i) and Jn( i) of dimension n ( = 0), then there is a permutation matrix, P,and matrix, E, so that

J = P EP1,

where E is a block matrix of the form

E =

D I 0 00 D I 0...

.... . . . . .

...

0 0 0. . . I

0 0 0 D

,

9


10/41

and with D the diagonal 2 2 matrix

D =

+ i 0

0 i

.

Furthermore, there is a complex invertible matrix, Q, and a real Jordan matrix, C, so that

J = QCQ1,

where C is of the form

C =

L I 0 00 L I 0...

.... . .

. . ....

0 0 0. . . I

0 0 0 L

,with

L =

.

Proof. First, consider an example, namely,

J =

+ i 1 0 0

0 + i 0 00 0 i 10 0 0 i

.If we permute rows 2 and 3, we get

+ i 1 0 0

0 0 i 10 + i 0 00 0 0 i

and we permute columns 2 and 3, we get our matrix,

E =

+ i 0 1 00 i 0 10 0 + i 00 0 0 i

.

We leave it as an exercise to generalize this method to two n n conjugate Jordan blocks toprove that we can find a permutation matrix, P, so that E = P1JP and thus, J = P EP1.

Next, as = 0, the matrix L can be diagonalized and one easily checks that

D =

+ i 0

0 i

=

i 1i 1

i 1i 1

1.

10


11/41

Therefore, using the block diagonal matrix S = diag(S2, . . . , S 2) consisting of n blocks

S2 =

i 1i 1

,

we see thatE = SC S1

and thus,J = PSCS1P1,

which yields our second result with Q = P S.

Proposition 1.8 shows that every (complex) matrix, A, is similar to a real Jordan matrix.Unfortunately, if A is a real matrix, there is no guarantee that we can find a real invertiblematrix, P, so that A = P JP1, with J a real Jordan matrix. This result known as the RealJordan Form is actually true but requires some work to be established.

To the best of our knowledge, a complete proof is not easily found. Horn and Johnsonstate such a result as Theorem 3.4.5 in Chapter 3, Section 4, in [19]. However, they leavethe details of the proof that a real P can be found as an exercise. We found that a proofcan be obtained from Theorem 1.5. Since we believe that some of the techniques involved inthis proof are of independent interest, we present this proof in full detail. It should be notedthat we were inspired by some arguments found in Gantmacher [14], Chapter IX, Section13.

Theorem 1.9 (Real Jordan Form) For every real n n matrix, A, there is some invertible(real) matrix, P, and some real Jordan matrix, J, so that

A = P JP1.

For every Jordan block, Jr(), of type (1), is some real eigenvalue of A and for everyJordan block, J2r(, ), of type (2), + i is a complex eigenvalue of A (with = 0). Everyeigenvalue of A corresponds to one of more Jordan blocks of J. Furthermore, the number,m, of Jordan blocks, the distinct Jordan block, Jsi(i), and the number of times, mi, thateach Jordan block, Jsi(i), occurs are uniquely determined by A.

Proof. Let f: V V be the linear map defined by A and let fC be the complexification off. Then, Theorem 1.5 yields a direct sum decomposition of VC of the form

VC = V1 Vm, ()where each Vi is a cyclic C[X]-module (associated with fC) whose minimal polynomial isof the form (X i)ri, where is some (possibly complex) eigenvalue of f. If W is anysubspace of VC, we define the conjugate, W, of W by

W = {u iv VC | u + iv W}.

11


12/41

It is clear that W is a subspace of VC of the same dimension as W and obviously, VC = VC.Our first goal is to prove the following claim:

Claim 1 . For each factor, Vj, the following properties hold:

(1) Ifu + iv,fC(u + iv), . . . , f

rj1

C (u + iv) span Vj, then u iv,fC(u iv), . . . , f rj1

C (u iv)span Vi and so, Vi is cyclic with respect to fC.(2) If (Xi)ri is the minimal polynomial ofVi, then (Xi)ri is the minimal polynomial

of Vi.

Proof of Claim 1 . As fC(u + iv) = f(u) + if(v), we have fC(u iv) = f(u) if(v). It followsthat fk

C(u + iv) = fk(u) + ifk(v) and fk

C(u iv) = fk(u) ifk(v), which implies that if Vj

is generated by u + iv,fC(u + iv), . . . , f rjC

(u + iv) then Vj is generated byu iv,fC(u iv), . . . , f rjC (u iv). Therefore, Vj is cyclic for fC.

We also prove the following simple fact: If

(fC (j + ij)id)(u + iv) = x + iy,then

(fC (j ij)id)(u iv) = x iy.Indeed, we have

x + iy = (fC (j + ij)id)(u + iv)= fC(u + iv) (j + ij)(u + iv)= f(u) + if(v) (j + ij)(u + iv)

and by taking conjugates, we get

x iy = f(u) if(v) (j ij)(u iv)= fC(u iv) (j ij)(u iv)= (fC (j ij)id)(u iv),

as claimed.

From the above, (fCj id)rj(x + iy) = 0 iff (fCjid)rj(x iy) = 0. Thus, (Xjid)rjannihilates Vj and as dim Vj = dim Vj and Vj is cyclic, we conclude that (X j)rj is theminimal polynomial of V

j.

Next we prove

Claim 2. For every factor, Vj , in the direct decomposition (), we have:(A) If (X j)rj is the minimal polynomial of Vj, with j R, then either

(1) Vj = Vj and ifu + iv generates Vj, then u iv also generates Vj , or

12


13/41

(2) Vj Vj = (0) and(a) the cyclic space Vj also occurs in the direct sum decomposition ()(b) the minimal polynomial of Vj is (X j)rj

(c) the spaces Vj and Vj contain only complex vectors (this means that ifx+ iy Vj,then x = 0 and y = 0 and similarly for Vj).

(B) If (X (j + ij))rj is the minimal polynomial of Vj with j = 0, then(d) Vj Vj = (0)(e) the cyclic space Vj also occurs in the direct sum decomposition ()(f) the minimal polynomial of Vj is (X (j ij))rj

(g) the spaces Vj and Vj contain only complex vectors.

Proof of Claim 2. By taking the conjugate of the direct sum decomposition () we getVC = V1 Vm.

By Claim 1, each Vj is a cyclic subspace with respect to fC of the same dimension as Vj andthe minimal polynomial of Vj is (X j)rj if the minimal polynomial of Vj is (X j)rj .It follows from the uniqueness assertion of Theorem 1.5 that the list of conjugate minimalpolynomials

(X 1)r1, . . . , (X m)rmis a permutation the list of minimal polynomials

(X 1)r1, . . . , (X m)rm

and so, every Vj is equal to some factor Vk (possibly equal to Vj if j is real) in the directdecomposition (), where Vk and Vj have the same minimal polynomial, (X j)rj .

Next, assume that (X j)rj is the minimal polynomial of Vj, with j R. Considerany generator, u + iv, for Vj. If u iv Vj, then by Claim 1, Vj Vj and so Vj = Vj,as dim Vj = dim Vj. We know that u + iv,fC(u + iv), . . . , f

rjC

(u + iv) generate Vj and thatu iv,fC(u iv), . . . , f rjC (u iv) generate Vj = Vj , which implies (1).

If u

iv /

Vj , then we proved earlier that Vj occurs in the direct sum (

) as some Vk

and that its minimal polynomial is also (X j)rj . Since u iv / Vj and Vj and Vj belongto a direct sum decomposition, Vj Vj = (0) and 2(a) and 2(b) hold. If u Vj or iv Vjfor some real u V or some real v V and u, v = 0, as Vj is a complex space, then v Vjand either u Vj or v Vj, contradicting Vj Vj = (0). Thus, 2(c) holds.

Now, consider the case where j = j + ij , with j = 0. Then, we know that Vj = Vkfor some Vk whose minimal polynomial is (X(j ij))rj in the direct sum (). As j = 0,

13


14/41

the cyclic spaces Vj and Vj correspond to distinct minimal polynomials (X (j + ij))rjand (X (j ij))rj , so Vj Vj = (0). It follows that Vj and Vj consist of complex vectorsas we already observed. Therefore, (d), (e), (f), (g) are proved, which finishes the proof ofClaim 2.

We now show how to produce some linearly independent vectors in V so that the matrixof f over these vectors is a real Jordan block.

(B) First, consider the case where the minimal polynomial of Vj is (X(j + ij))rj withj = 0.

By Claim 1(1), if u + iv generates Vj, then u iv generates Vj and by Proposition 1.5,the subspace Vj has a basis (u1 + iv1, . . . , urj + ivrj) and the subspace Vj has a basis(u1 iv1, . . . , urj ivrj), with

uk + ivk = (fC (j + ij)id)rjk(u + iv), 1 k rjand uk, vk

= 0 for k = 1, . . . , rj. Recall that

fC(u1 + iv1) = (j + ij)(u1 + iv1)

fC(uk + ivk) = uk1 + ivk1 + (j + ij)(uk + ivk), 2 k rj .Thus, we get

f(u1) + if(v1) = ju1 jv1 + i(ju1 + jv1)f(uk) + if(vk) = uk1 + juk jvk + i(vk1 + juk + jvk), 2 k rj.

which yields

f(u1) = ju1 jv1f(v

1) =

ju1

+ j

v1

f(uk) = uk1 + juk jvkf(vk) = vk1 + juk + jvk, 2 k rj.

Now, (u1 + iv1, . . . , urj + ivrj) form a basis of complex vectors ofVj , (u1 iv1, . . . , urj ivrj)form a basis of complex vectors of Vj and Vj Vj = (0), so the vectors

u1, v1, u2, v2, . . . , urj , vrj

are linearly independent. Using these vectors, the matrix giving f(u1) and f(v1) over (u1, v1)is

j j

j jand the matrix giving f(uk) and f(vk) over (uk1, vk1, uk, vk) for k 2 is

1 00 1

j jj j

.14


15/41

This means that over the basis (u1, v1, u2, v2, . . . , urj , vrj), the restriction off is a real Jordanblock.

(A) Let us now consider the case where the minimal polynomial of Vj is (X j)rj , withj R. IfVj = Vj and ifu+iv generates Vj, then Vj has the two bases (u1+iv1, . . . , urj +ivrj)and (u1 iv1, . . . , urj ivrj), with

uk + ivk = (fC jid)rjk(u + iv), 1 k rj .But then, either (u1, . . . , urj1) are linearly independent or (v1, . . . , vrj1) are linearly inde-pendent. As j = 0, the computation in (B) yields

f(u1) = ju1

f(v1) = jv1

f(uk) = uk1 + juk

f(vk) = vk1 + jvk, 2 k rj.If (u1, . . . , urj1) is linearly independent we see that

f(u1) = ju1

and the matrix giving f(uk) over (uk1, uk) for k 2 is1 0

j 0

.

This means that over the basis (u1, u2, . . . , urj1), the restriction of f is a real Jordan block.If (v1, . . . , vrj1) is linearly independent, then we obtain the same Jordan block.

If Vj

Vj = (0) and if u + iv generates Vj , then, as in (B), Vj has the two bases

(u1 + iv1, . . . , urj + ivrj) and (u1 iv1, . . . , urj ivrj), withuk + ivk = (fC jid)rjk(u + iv), 1 k rj .

Moreover, in this case, all vectors are complex. As a consequence, (u1, . . . , urj1) and(v1, . . . , vrj1) are linearly independent. Since the computation made in the previous casestill holds (j = 0), we see that the restriction of f is a real Jordan block over the basis(u1, . . . , urj1).

Finally, by taking the union of all the real bases either associated with a conjugate pair(Vj, Vj) of with a subspace Vj corresponding to a real eigenvalue j, we obtain a matrix forf which is a real Jordan matrix.

Let A be a real matrix and let (X 1)r1, . . . , (X m)m1 be its list of elementarydivisors or, equivalently, let Jr1(1), . . . , J rm(m) be its list of Jordan blocks. If, for everyri and every real eigenvalue i < 0, the number, mi, of Jordan blocks identical to Jri(i) iseven, then there is a way to rearrange these blocks using the technique of Proposition 1.8 toobtain a version of the real Jordan form that makes it easy to find logarithms (and squareroots) of real matrices.

15


16/41

Theorem 1.10 (Real Jordan Form, Special Version) Let A be a real n n matrix and let(X 1)r1, . . . , (X m)m1 be its list of elementary divisors or, equivalently, let Jr1(1), . . .,Jrm(m) be its list of Jordan blocks. If, for every ri and every real eigenvalue i < 0, thenumber, mi, of Jordan blocks identical to Jri(i) is even, then there is a real invertible matrix,

P, and a real Jordan matrix, J

, such that A = P J

P

1

and(1) Every block, Jri(i), of J for which i R and i 0 is a Jordan block of type (1) of

J (as in Definition 1.4), or

(2) For every block, Jri(i), of J for which either i R and i < 0 or i = i + ii withi = 0 (i, i R), the corresponding real Jordan block of J is defined as follows:(a) If i = 0, then J contains the real Jordan block J2ri(i, i) of type (2) (as in

Definition 1.4), or

(b) If i < 0 then J contains the real Jordan block J2ri(i, 0) whose diagonal blocks

are of the form

L(i, 0) =i 0

0 i

.

Proof. By hypothesis, for every real eigenvalue, i < 0, for every ri, the Jordan block,Jri(i), occurs an even number of times say 2ti, so by using a permutation, we may assumethat we have ti pairs of identical blocks (Jri(i), Jri(i)). But then, for each pair of blocksof this form, we can apply part (1) of Proposition 1.8 (since i is its own conjugate), whichyields our result.

Remark: The above result generalizes the fact that when we have a rotation matrix, R, the

eigenvalues 1 occurring in the real block diagonal form of R can be paired up.The following theorem shows that the structure of the Jordan form of a matrix is

preserved under exponentiation. This is an important result that will be needed to establishthe necessity of the criterion for a real matrix to have a real logarithm.

Theorem 1.11 For any (real or complex) n n matrix, A, if A = P JP1 where J is aJordan matrix of the form

J =

Jr1(1) 0...

. . ....

0 Jrm(m)

,

then there is some invertible matrix, Q, so that the Jordan form of eA is given by

eA = Q e(J) Q1,

16


17/41

where e(J) is the Jordan matrix

e(J) =

Jr1(e1) 0

.... . .

...

0 Jrm(em

)

,

that is, each Jrk(ek) is obtained from Jrk(k) by replacing all the diagonal enties k by e

k .Equivalently, if the list of elementary divisors of A is

(X 1)r1 , . . . , (X m)rm,then the list of elementary divisors of eA is

(X e1)r1, . . . , (X em)rm.Proof. Theorem 1.11 is a consequence of a general theorem about functions of matrices

proved in Gantmacher [14], see Chapter VI, Section 8, Theorem 9. Because a much moregeneral result is proved, the proof in Gantmacher [14] is rather involved. However, it ispossible to give a simpler proof exploiting special properties of the exponential map.

Let f be the linear map defined by the matrix A. The strategy of our proof is to go backto the direct sum decomposition given by Theorem 1.5,

V = V1 V2 Vm,where each Vi is a cyclic C[X]-module such that the minimal polynomial of the restrictionof f to Vi is of the form (X i)ri. We will prove that

(1) The vectors

u, ef(u), (ef)2(u), . . . , (ef)ri1(u)

form a basis ofVi (here, (ef)k = ef ef, the composition ofef with itself k times).

(2) The polynomial (X ei)ri is the minimal polynomial of the restriction of ef to Vi.First, we prove that Vi is invariant under e

f. Let N = f iid. To say that (X i)riis the minimal polynomial of the restriction of f to Vi is equivalent to saying that N isnilpotent with index of nilpotency, r = rj. Now, N and iid commute so as f = N + iid,we have

ef = eN+iid = eNeiid = eieN.

Furthermore, as N is nilpotent, we have

eN = id + N +N2

2!+ + N

r1

(r 1)! ,

so

ef = ei

id + N +N2

2!+ + N

r1

(r 1)!

.

17


18/41

Now, Vi is invariant under f so Vi is invariant under N = f iid and this implies that Viis invariant under ef. Thus, we can view Vi as a C[X]-module with respect to e

f.

From the formula for ef we get

ef eiid = ei id + N + N22! + + Nr1(r 1)! eiid= ei

N +

N2

2!+ + N

r1

(r 1)!

.

If we let N = N + N22!

+ + Nr1

(r 1)! ,

we claim that

Nr1 = Nr1 and

Nr = 0.

The case r = 1 is trivial so we may assume r 2. Since N = NR for some R such thatNR = RN and Nr = 0, the second property is clear. The first property follows by observingthat N = N + N2T, where N and T commute, so using the binomial formula,

Nr1 = r1k=0

r 1

k

Nk(N2T)r1k =

r1k=0

r 1

k

N2rk2Tr1k = Nr1,

since 2r k 2 r for 0 k r 2 and Nr = 0.Recall from Proposition 1.6 that

((f

iid)ri1(u), . . . , (f

iid)(u), u)

is a basis ofVi, which implies that Nr1(u) = (f iid)ri1(u) = 0. Since Nr1 = Nr1, we

have Nr1(u) = 0 and as Nr = 0, we have Nr(u) = 0. It is well-known that these two factsimply that

u, N(u), . . . , Nr1(u)are linearly independent. Indeed, if we had a linear dependence relation

a0u + a1 N(u) + + ar1 Nr1(u) = 0,by applying N

r1, as Nr(u) = 0 we get a0 N

r1(u) = 0, so, a0 = 0 as Nr1(u) = 0; by

applying Nr2 we get a1 Nr1(u) = 0, so a1 = 0; using induction, by applying Nrk2 toak+1 Nk+1(u) + + ar1 Nr1(u) = 0,

we get ak+1 = 0 for k = 0, . . . , r 2. Since Vi has dimension r (= ri), we deduce that

(u, N(u), . . . , Nr1(u))18


19/41

is a basis of Vi. But ef = ei(id + N), so for k = 0, . . . , r 1, each Nk(u) is a linear

combination of the vectors u, ef(u), . . . , (ef)r1(u) which implies that

(u, ef(u), (ef)2(u), . . . , (ef)r1(u))

is a basis of Vi. This implies that any annihilating polynomial of Vi has degree no less thanr and since (X ei)r annihilates Vi, it is the minimal polynomial of Vi.In summary, we proved that each Vi is a cyclic C[X]-module (with respect to e

f) andthat in the direct sum decomposition

V = V1 Vm,the polynomial (X ei)ri is the minimal polynomial of Vi, which is Theorem 1.5 for ef.Then, Theorem 1.11 follows immediately from Proposition 1.6.

2 Logarithms of Real Matrices; Criteria for Existence

and Uniqueness

If A is any (complex) n n matrix we say that a matrix, X, is a logarithm of A iff eX = A.Our goal is to find conditions for the existence and uniqueness of real logarithms of realmatrices. The two main theorems of this section are Theorem 2.4 and Theorem 2.11. Thesetheorems are used in papers presenting methods for computing the logarithm of a matrix,including Cheng, Higham, Kenney and Laub [10] and Kenney and Laub [22].

Reference [10] cites Kenney and Laub [22] for a proof of Theorem 2.11 but in fact, thatpaper does not give a proof. Kenney and Laub [22] do state Theorem 2.11 as Lemma A2of Appendix A, but they simply say that the proof is similar to that of Lemma A1. As

to the proof of Lemma A1, Kenney and Laub state without detail that it makes use of theCauchy integral formula for operators, a method used by DePrima and Johnson [12] to provea similar theorem for complex matrices (Section 4, Lemma 1) and where uniqueness is alsoproved. Kenney and Laub point out that the third hypothesis in that lemma is redundant.Theorem 2.11 also appears in Highams book [17] as Theorem 1.31. Its proof relies onTheorem 1.28 and Theorem 1.18 (both in Highams book) but Theorem 1.28 is not provedand only part of theorem 1.18 is proved in the text (closer examination reveals that Theorem1.36 (in Highams book) is needed to prove Theorem 1.28). Although Highams Theorem1.28 implies the injectivity statement of Theorem 2.9 we feel that the proof of Theorem 2.9is of independent interest. Furthermore, Theorem 2.9 is a stronger result (it shows that exp

is a diffeomorphism).Given this state of affairs where no explicit proof of Theorem 2.11 seems easily available,

we provide a complete proof Theorem 2.11 using our special form of the Real Jordan Form.

First, let us consider the case where A is a complex matrix. Now, we know that if A = eX,then det(A) = etr(X) = 0, so A must be invertible. It turns out that this condition is alsosufficient.

19


20/41

Recall that for every invertible matrix, P, and every matrix, A,

ePAP1

= P eAP1

and that for every block diagonal matrix,

A =

A1 0... . . . ...0 Am

,we have

eA =

eA1 0...

. . ....

0 eAm

.Consequenly, the problem of finding the logarithm of a matrix reduces to the problem of

finding the logarithm of a Jordan block Jr() with = 0. However, every such Jordan block,Jr(), can be written as

Jr() = I + H = I(I+ 1H),

where H is the nilpotent matrix of index of nilpotency, r, given by

H =

0 1 0 00 0 1 0...

.... . . . . .

...

0 0 0. . . 1

0 0 0 0

.

Furthermore, it is obvious that N = 1H is also nilpotent of index of nilpotency, r, and wehave

Jr() = I(I+ N).

Logarithms of the diagonal matrix, I, are easily found. If we write = ei where > 0, then log = log + i( + 2h), for any h Z, and we can pick a logarithm of I tobe

S =

log + i 0 00 log + i 0...

.

.... .

.

..0 0 log + i

.

Observe that if we can find a logarithm, M, of I + N, as S commutes with any matrix andas eS = I and eM = I+ N, we have

eS+M = eSeM = I(I + N) = Jr(),

20


21/41

which means that S+ M is a logarithm ofJr(). Therefore, the problem reduces to findingthe logarithm of a unipotent matrix, I + N. However, this problem always has a solution.To see this, remember that for |u| < 1, the power series

log(1 + u) = u u2

2 +

u3

3 + + (1)n+1u

n

n + is normally convergent. It turns out that the above fact can be generalized to matrices inthe following way:

Proposition 2.1 For every n n matrix, A, such that A < 1, the series

log(I+ A) = A A2

2+

A3

3+ + (1)n+1A

n

n+

is normally convergent for any norm onCn2. Furthermore, if A < 1, we have

elog(I+A) = I + A.

Remark: The formal power series, eAI and log(I+A) are mutual inverses, eAI convergesnormally everywhere and log(I + A) converges normally for A < 1 so, there is some r,with 0 < r < 1, so that

log(eA) = A, if A < r.For any given r 1, the exponential and the logarithm (of matrices) turn out to give a

homeomorphim between the set of nilpotent matrices, N, and the set of unipotent matrices,I + N, for which Nr = 0. Let

Nil(r) denote the set of (real or complex) nilpotent matrices

of any dimension n 1 such that Nr = 0 and Uni(r) denote the set of unipotent matrices,U = I+ N, where N Nil(r). IfU = I+ N Uni(r), note that log(I+ N) is well-definedsince the power series for log(I+ N) only has r 1 nonzero terms,

log(I+ N) = N N2

2+

N3

3+ + (1)r N

r1

r 1 .

Proposition 2.2 The exponential map, exp: Nil(r) Uni(r), is a homeomorphism whoseinverse is the logarithm.

Proof. A complete proof can be found in Mmeimne and Testard [25], Chapter 3, Theorem

3.3.3. The idea is to prove that

log(eN) = N, for all N Nil(r) and elog(U) = U, for all U Uni(r).To prove the first identity, it is enough to show that for any fixed N Nil(r), we have

log(etN) = tN, for all t R.

21


22/41

To do this, observe that the functions t tN and t log(etN) are both equal to 0 for t = 0.Thus, it is enough to show that their derivatives are equal, which is left as an exercise.

Next, for any N Nil(r), the map

t elog(I+tN)

(I+ tN), t Ris a polynomial, since Nr = 0. Furthermore, for t sufficiently small, tN < 1 and inview of Proposition 2.1, we have elog(I+tN) = I + tN, so the above polynomial vanishes in aneighborhood of 0, which implies that it is identically zero. Therefore, elog(I+N) = I+ N, asrequired. The continuity of exp and log is obvious.

Proposition 2.2 shows that every unipotent matrix, I + N, has the unique logarithm

log(I+ N) = N N2

2+

N3

3+ + (1)r N

r1

r 1 ,

where r is the index of nilpotency ofN. Therefore, if we let M = log(I+ N), we have finallyfound a logarithm, S+ M, for our original matrix, A. As a result of all this, we have provedthe following theorem:

Theorem 2.3 Every n n invertible complex matrix, A, has a logarithm, X. To find sucha logarithm, we can proceed as follows:

(1) Compute a Jordan form, A = P JP1, forA and let m be the number of Jordan blocksin J.

(2) For every Jordan block, Jrk(k), of J, write Jrk(j) = kI(I + Nk), where Nk is

nilpotent.(3) If k = ke

ik , with k > 0, let

Sk =

log k + ik 0 0

0 log k + ik 0...

.... . .

...0 0 log k + ik

.We have kI = e

Sk.

(4) For everyNk, let

Mk = Nk N2k

2+

N3k3

+ + (1)rk Nrk1

rk 1 .

We have I + Nk = eMk .

22


23/41

(5) If Yk = Sk + Mk and Y is the block diagonal matrix diag(Y1, . . . , Y m), then

X = P Y P1

is a logarithm of A.

Let us now assume that A is a real matrix and let us try to find a real logarithm. There isno problem in finding real logarithms of the nilpotent parts but we run into trouble wheneveran eigenvalue is complex or real negative. Fortunately, we can circumvent these problemsby using the real Jordan form, provided that the condition of Theorem 1.10 holds.

The theorem below gives a necessary and sufficient condition for a real matrix to have areal logarithm. The first occurrence of this theorem that we have found in the literature is apaper by Culver [11] published in 1966. The proofs in this paper rely heavily on results fromGantmacher [14]. Theorem 2.4 is also stated in Horn and Johnson [20] as Theorem 6.4.15(Chapter 6), but the proof is left as an exercise. We offer a proof using Theorem 1.10 whichis more explicit than Culvers proof.

Theorem 2.4 Let A be a real n n matrix and let (X 1)r1, . . . , (X m)m1 be its listof elementary divisors or, equivalently, let Jr1(1), . . ., Jrm(m) be its list of Jordan blocks.Then, A has a real logarithm iff A is invertible and if, for every ri and every real eigenvaluei < 0, the number, mi, of Jordan blocks identical to Jri(i) is even.

Proof. First, assume that A satisfies the conditions of Theorem 2.4. Since the matrix Asatisfies the condition of Theorem 1.10, there is a real invertible matrix, P, and a realJordan matrix, J, so that

A = P J

P

1

,where J satisfies conditions (1) and (2) of Theorem 1.10. As A is invertible, every blockof J of the form Jrk(k) corresponds to a real eigenvalue with k > 0 and we can writeJrk(j) = kI(I + Nk), where Nk is nilpotent. As in Theorem 2.3 (4), we can find a reallogarithm, Mk, of I + Nk and as k > 0, the diagonal matrix kI has the real logarithm

Sk =

log k 0 00 log k 0...

.... . .

...0 0 log k

.

Set Yk = Sk + Mk.

The other real Jordan blocks of J are of the form J2rk(k, k), with k, k R, not bothzero. Consequently, we can write

J2rk(k, k) = Dk + Hk = Dk(I + D1k Hk)

23


24/41

where

Dk =

L(k, k) 0...

. . ....

0 L(k, k)

with

L(k, k) =

k kk k

,

and Hk is a real nilpotent matrix. If we let Nk = D1k Hk, then Nk is also nilpotent,

J2rk(k, k) = Dk(I + Nk), and we can find a logarithm, Mk, of I + Nk as in Theorem2.3 (4). We can write k + ik = ke

ik , with k > 0 and k [, ), and then

L(k, k) =

k kk k

= k

cos k sin ksin k cos k

.

If we set

S(k, k) = log k kk log k ,a real matrix, we claim that

L(k, k) = eS(k,k).

Indeed, S(k, k) = log kI + kE2, with

E2 =

0 11 0

,

and it is well known that

e

kE2

= cos k sin ksin k cos k ,so, as log kI and kE2 commute, we get

eS(k ,k) = elog kI+kE2 = elog kIekE2 = k


= L(k, k).

If we form the real block diagonal matrix,

Sk =

S(k, k) 0

.... . .

...0

S(k, k)

,

we have Dk = eSk . Since Sk and Mk commute and

eSk+Mk = eSkeMk = Dk(I+ Nk) = J2rk(k, k),

the matrix Yk = Sk + Mk is a logarithm of J2rk(k, k). Finally, if Y is the block diagonalmatrix diag(Y1, . . . , Y m), then X = P Y P

1 is a logarithm of A.

24


25/41

Let us now prove that if A has a real logarithm, X, then A satisfies the conditionof Theorem 2.4. As we said before, A must be invertible. Since X is a real matrix, weknow from the proof of Theorem 1.9 that the Jordan blocks of X associated with complexeigenvalues occur in conjugate pairs, so they are of the form

Jrk(k), k R,Jrk(k) and Jrk(k), k = k + ik, k = 0.

By Theorem 1.11, the Jordan blocks of A = eX are obtained by replacing each k by ek ,

that is, they are of the form

Jrk(ek), k R,

Jrk(ek) and Jrk(e

k), k = k + ik, k = 0.If k R, then ek > 0, so the negative eigenvalues of A must be of the form ek orek , with k complex. This implies that k = k + (2h + 1)i, for some h Z, but thenk = k (2h + 1)i and so

ek = ek .

Consequently, negative eigenvalues ofA are associated with Jordan blocks that occur in pair,as claimed.

Remark: It can be shown (see Culver [11]) that all the logarithms of a Jordan block, Jrk(k),corresponding to a real eigenvalue k > 0 are obtained by adding the matrices

i2hkI, hk Z,to the solution given by the proof of Theorem 2.4 and that all the logarithms of a Jordan

block, J2rk(k, k), are obtained by adding the matricesi2hkI+ 2lkE hk, lk Z,

to the solution given by the proof of Theorem 2.4, where

E =

E2 0... . . . ...0 E2

, E2 = 0 11 0

.

One should be careful no to relax the condition of Theorem 2.4 to the more liberalcondition stating that for every Jordan block, Jrk(k), for which k < 0, the dimension rkis even (i.e, k occurs an even number of times). For example, the following matrix

A =

1 10 1

satisfies the more liberal condition but it does not possess any real logarithm, as the readerwill verify. On the other hand, we have the following corollary:

25


26/41

Corollary 2.5 For every real invertible matrix, A, if A has no negative eigenvalues, thenA has a real logarithm.

More results about the number of real logarithms of real matrices can be found in Culver

[11]. In particular, Culver gives a necessary and sufficient condition for a real matrix, A, tohave a unique real logarithm. This condition is quite strong. In particular, it requires thatall the eigenvalues of A be real and positive.

A different approach is to restrict the domain of real logarithms to obtain a sufficientcondition for the uniqueness of a logarithm. We now discuss this approach. First, we statethe following property that will be useful later:

Proposition 2.6 For every (real or complex) invertible matrix, A, there is as semisimplematrix, S, and a unilpotent matrix, U, so that

A = SU and SU = US.

Furthermore, S and U as above are unique.

Proof. Proposition 2.6 follows immediately from Theorem 1.3, the details are left as anexercise.

The form, SU, of an invertible matrix is often called the multiplicative Jordan decompo-sition.

Definition 2.7 Let S(n) denote the set of all real matrices whose eigenvalues, + i, lie inthe horizontal strip determined by the condition < < .

It is easy to see that S(n) is star-shaped (which means that if it contains A, then itcontains A for all [0, 1]) and open (because the roots of a polynomial are continuousfunctions of the coefficients of the polynomial). As S(n) is star-shaped, it is path-connected.Furthermore, if A S(n), then P AP1 S(n) for every invertible matrix, P. The remark-able property ofS(n) is that the restriction of the exponential to S(n) is a diffeomorphismonto its image. To prove this fact we will need the following proposition:

Proposition 2.8 For any two real or complex matrices, S1 andS2, if the eigenvalues, +i,of S1 and S2 satisfy the condition < , ifS1 and S2 are semisimple and if eS1 = eS2,then S1 = S2.

Proof. Since S1 and S2 are semisimple, they can be diagonalized over C, so let (u1, . . . , un)be a basis of eigenvectors ofS1 associated with the (possibly complex) eigenvalues 1, . . . , nand let (v1, . . . , vn) be a basis of eigenvectors of S2 associated with the (possibly complex)eigenvalues 1, . . . , n. We prove that ife

S1 = eS2 = A, then S1(vi) = S2(vi) for all vi, whichshows that S1 = S2.

26


27/41

Pick any eigenvector, vi, of S2 and write v = vi and = i. We have

v = 1u1 + + kuk,for some unique j. We compute A(v) in two different ways. We know that e

1, . . . , en are

the eigenvalues of eS2 for the eigenvectors v1, . . . , vn, so

A(v) = eS2(v) = ev = 1eu1 + + keuk.

Similarly, we know that e1 , . . . , en are the eigenvalues ofeS1 for the eigenvectors u1, . . . , un,so

A(v) = A(1u1 + + kuk)= 1A(u1) + + kA(uk)= 1e

S1(u1) + + keS1(uk)= 1e

1u1 +

+ kenuk.

Therefore, we deduce thatke

= kek , 1 k n.

Consequently, if k = 0, thene = ek ,

which implies k = i2h, for some h Z. However, due to the hypothesis on theeigenvalues of S1 and S2, and i must belong to the horizontal strip determined by thecondition < (z) , so we must have h = 0 and then = k. If we letI = {k | k = }, then v =

kIkuk and we have

S1(v) = S1

kI

kuk

=kI

kS1(uk)

=kI

kkuk

=kI

kuk

= kI kuk = v.Therefore, S1(v) = v. As is an eigenvector of S2 for the eigenvalue , we also haveS2(v) = v. Therefore,

S1(vi) = S2(vi), i = 1, . . . , n ,

which proves that S1 = S2.

27


28/41

Obviously, Proposition 2.8 holds for real semisimple matrices, S1, S2, in S(n), since thecondition for being in S(n) is < () < for every eigenvalue, , of S1 or S2.

We can now state our next theorem, an important result. This theorem is a consequenceof a more general fact proved in Bourbaki [8] (Chapter III, Section 6.9, Proposition 17, see

also Theorem 6).

Theorem 2.9 The restriction of the exponential map to S(n) is a diffeomorphism of S(n)onto its image, exp(S(n)). If A exp(S(n)), then P AP1 S(n), for every (real) invert-ible matrix, P. Furthermore, exp(S(n)) is an open subset of GL(n,R) containing I andexp(S(n)) contains the open ball, B(I, 1) = {A GL(n,R) | A I < 1}, for every norm on n n matrices satisfying the condition AB A B.

Proof. A complete proof is given in Mmeimne and Testard [25], Chapter 3, Theorem 3.8.4.Part of the proof consists in showing that exp is a local diffeomorphism and for this, to provethat d exp(X) is invertible. This requires finding an explict formula for the derivative of the

exponential and we prefer to omit this computation, which is quite technical. Proving thatB(I, 1) S(n) is easier but requires a little bit of complex analysis. Once these facts areestablished, it remains to prove that exp is injective on E(n), which we will prove.

The trick is to use both the Jordan decomposition and the multiplicative Jordan de-composition! Assume that X1, X2 S(n) and that eX1 = eX2. Using Theorem 1.3 canwrite X1 = S1 + N1 and X2 = S2 + N2, where S1, S2 are semisimple, N1, N2 are nilpotent,S1N1 = N1S1, and S2N2 = N2S2. From e

X1 = eX2 , we get

eS1eN1 = eS1+N1 = eS2+N2 = eS2eN2.

Now, S1 and S2 are semisimple, so eS1 and eS2 are semisimple and N1 and N2 are nilpotent

so eN1 and eN2 are unipotent. Moreover, as S1N1 = N1S1 and S2N2 = N2S2, we haveeS1eN1 = eN1eS1 and eS2eN2 = eN2eS2. By the uniqueness property of Proposition 2.6, weconclude that

eS1 = eS2 and eN1 = eN2 .

Now, as N1 and N2 are nilpotent, there is some r so that Nr1 = N

r2 = 0 and then, it is clear

that eN1 = I + N1 and eN2 = I + N2 with Nr1 = 0 and Nr2 = 0. Therefore, we can applyProposition 2.2 to conclude that

N1 = N2.

As S1, S2 S(n) are semisimple and eS1 = eS2 , by Proposition 2.8, we conclude thatS1 = S2.

Therefore, we finally proved that X1 = X2, showing that exp is injective on S(n).

Remark: Since proposition 2.8 holds for semisimple matrices, S, such that the condition < holds for every eigenvalue, + i, of S, the restriction of the exponential to

28


29/41

real matrices, X, whose eigenvalues satisfy this condition is injective. Note that the image ofthese matrices under the exponential contains matrices, A = eX, with negative eigenvalues.Thus, combining Theorem 2.4 and the above injectivity result we could state an existenceand uniqueness result for real logarithms of real matrices that is more general than Theorem

2.11 below. However this is not a practical result since it requires a condition on the numberof Jordan blocks and such a condition is hard to check. Thus, we will restrict ourselves toreal matrices with no negative eigenvalues (see Theorem 2.11).

Since the eigenvalues of a nilpotent matrix are zero and since symmetric matrices havereal eigenvalues, Theorem 2.9 has has two interesting corollaries. Denote by S(n) the vectorspace of real n n matrices and by SPD(n) the set of n n symmetric, positive, definitematrices. It is known that exp: S(n) SPD(n) is a bijection.Corollary 2.10 The exponential map has the following properties:

(1) The map exp: Nil(r) Uni(r) is a diffeomorphism.

(2) The map exp: S(n) SPD(n) is a diffeomorphism.By combining Theorem 2.4 and Theorem 2.9 we obtain the following result about the

existence and uniqueness of logarithms of real matrices:

Theorem 2.11 (a) IfA is any real invertiblenn matrix andA has no negative eigenvalues,then A has a unique real logarithm, X, with X S(n).

(b) The image, exp(S(n)), ofS(n) by the exponential map is the set of real invertible ma-trices with no negative eigenvalues and exp: S(n) exp(S(n)) is a diffeomorphism betweenthese two spaces.

Proof. (a) If we go back to the proof of Theorem 2.4, we see that complex eigenvalues ofthe logarithm, X, produced by that proof only occur for matrices

S(k, k) =

log k k

k log k

,

associated with eigenvalues k + ik = k eik . However, the eigenvalues of such matrices are

log k ik and since A has no negative eigenvalues, we may assume that < k < , andso X S(n), as desired. By Theorem 2.9, such a logarithm is unique.

(b) Part (a) proves that the set of real invertible matrices with no negative eigenvaluesis contained in exp(

S(n)). However, for any matrix, X

S(n), since every eigenvalue of eX

is of the form e+i = eei for some eigenvalue, + i, of X and since + i satisfies thecondition < < , the number, ei, is never negative, so eX has no negative eigenvalues.Then, (b) follows directly from Theorem 2.9.

Remark: Theorem 2.11 (a) first appeared in Kenney and Laub [22] (Lemma A2, AppendixA) but without proof.

29


30/41

3 Square Roots of Real Matrices; Criteria for Exis-

tence and Uniqueness

In this section we investigate the problem of finding a square root of a matrix, A, that is, a

matrix, X, such that X2

= A. If A is an invertible (complex) matrix, then it always has asquare root, but singular matrices may fail to have a square root. For example, the nilpotentmatrix,

H =

0 1 0 00 0 1 0...

.... . . . . .

...

0 0 0. . . 1

0 0 0 0

has no square root (ckeck this!). The problem of finding square roots of matrices is thoroughlyinvestigated in Gantmacher [14], Chapter VIII, Sections 6 and 7. For singular matrices,

finding a square root reduces to the problem of finding the square root of a nilpotent matrix,which is not always possible. A necessary and sufficient condition for the existence of asquare root is given in Horn and Johnson [20], see Chapter 6, Section 4, especially Theorem6.1.12 and Theorem 6.4.14. This criterion is rather complicated because its deals with non-singular as well as singular matrices. In these notes, we will restrict our attention to invertiblematrices. The main two Theorems of this section are Theorem 3.4 and Theorem 3.8. Theformer theorem appears in Higham [16] (Theorem 5). The first step is to prove a version ofTheorem 1.11 for the function A A2, where A is invertible.Theorem 3.1 For any (real or complex) invertible n n matrix, A, if A = P JP1 whereJ is a Jordan matrix of the form

J =

Jr1(1) 0... . . . ...0 Jrm(m)

,then there is some invertible matrix, Q, so that the Jordan form of A2 is given by

eA = Q s(J) Q1,

where s(J) is the Jordan matrix

s(J) = Jr1(21) 0

... . . . ...0 Jrm(2m)

,that is, each Jrk(

2k) is obtained from Jrk(k) by replacing all the diagonal enties k by

2k.

Equivalently, if the list of elementary divisors of A is

(X 1)r1 , . . . , (X m)rm,

30


31/41

then the list of elementary divisors of A2 is

(X 21)r1 , . . . , (X 2m)rm.Proof. Theorem 3.1 is a consequence of a general theorem about functions of matrices proved

in Gantmacher [14], see Chapter VI, Section 8, Theorem 9. However, it is possible to give asimpler proof exploiting special properties of the squaring map.

Let f be the linear map defined by the matrix A. The proof is modeled after the proofof Theorem 1.11. Consider the direct sum decomposition given by Theorem 1.5,

V = V1 V2 Vm,where each Vi is a cyclic C[X]-module such that the minimal polynomial of the restrictionof f to Vi is of the form (X i)ri. We will prove that

(1) The vectors

u, f2

(u), f4

(u), . . . , f 2(r

i1)

(u)form a basis ofVi.

(2) The polynomial (X 2i )ri is the minimal polynomial of the restriction of f2 to Vi.Since Vi is invariant under f, it is clear that Vi is invariant under f

2 = ff. Thus, we canview Vi as a C[X]-module with respect to f

2. Let N = fiid. To say that (Xi)ri is theminimal polynomial of the restriction of f to Vi is equivalent to saying that N is nilpotentwith index of nilpotency, r = rj. Now, N and iid commute so as f = iid + N, we have

f2 = 2i id + 2iN + N2

and sof2 2i id = 2iN + N2.

Since we are assuming that f is invertible, i = 0, so

f2 2i id = 2i

N +N2

2i

.

If we let

N = N +

N2

2i,

we claim that Nr1 = Nr1 and Nr = 0.The proof is identical to the proof given in Theorem 1.11. Again, as in the proof of

Theorem 1.11, we deduce that we have Nr1(u) = 0 and Nr(u) = 0, from which we inferthat

(u, N(u), . . . , Nr1(u))31


32/41

is a basis of Vi. But f2 2i id = 2i N, so for k = 0, . . . , r 1, each Nk(u) is a linear

combination of the vectors u, f2(u), . . . , f 2(r1)(0) which implies that

(u, f2(u), f4(u), . . . , f 2(r1)(u))

is a basis of Vi. This implies that any annihilating polynomial of Vi has degree no less thanr and since (X 2i )r annihilates Vi, it is the minimal polynomial ofVi. Theorem 3.1 followsimmediately from Proposition 1.6.

Remark: Theorem 3.1 can be easily generalized to the map A Ap, for any p 2, thatis, by replacing A2 by Ap, provided A is invertible. Thus, if the list of elementary divisorsof A is

(X 1)r1 , . . . , (X m)rm,then the list of elementary divisors of Ap is

(X p

1)r1

, . . . , (X p

m)rm

.

The next step is to find the square root of a Jordan block. Since we are assuming thatour matrix is invertible, every Jordan block, Jrk(k), can be written as

Jrk(k) = kI

I+

H

k

,

where H is nilpotent. It is easy to find a square root of kI. If k = kek , with k > 0,

then

Sk =

k eik2 0 0

0 k eik2 0

......

. . ....

0 0 k eik2

is a square root ofkI. Therefore, the problem reduces to finding square roots of unipotentmatrices. For this, we recall the power series

(1 + x)1

2 = 1 +1

2x + + 1

n!

1

2

1

2 1

1

2 n + 1

xn +

=

n=0(1)n1 (2n)!

(2n

1)(n!)222nxn,

which is normally convergent for |x| < 1. Then, we can define the power series, R, of amatrix variable, A, by

R(A) =n=1

(1)n1 (2n)!(2n 1)(n!)222n A

n,

32


33/41

and this power series converges normally for A < 1. As a formal power series, note thatR(0) = 0 and R(0) = 12 = 0 so, by a theorem about formal power series, R has a uniqueinverse, S, such that S(0) = 0 (see Lang [24] or H. Cartan [9]). But, if we consider the powerseries, S(A) = (I + A)2 I, when A is a real number, we have R(A) = 1 + A 1, so weget R S(A) = 1 + (1 + A)2 1 1 = A,from wich we deduce that S and R are mutual inverses. But, R converges everywhere andS converges for A < 1, so by another theorem about converging power series, if we let

I + A = R(A) + I, there is some r, with 0 < r < 1, so that

(

I+ A)2 = I + A, if A < r

and

(I + A)2 = I+ A, if A < r.

IfA is unipotent, that is, A = I+N with N nilpotent, we see that the series has only finitelymany terms. This fact allows us to prove the proposition below.

Proposition 3.2 The squaring map, A A2, is a homeomorphism from Uni(r) to itselfwhose inverse is the map A A = R(A I) + I.

Proof. IfA = I+ N with Nr = 0, as A2 = I+ 2N+ N2 it is clear that (2N+ N2)r = 0, sothe squaring map is well defined on unipotent matrices. We use the technique of Proposition2.2. Consider the map

t (I + tN)2 (I+ tN), t R.

It is a polynomial since Nr = 0. Furthermore, for t sufficiently small, tN < 1 and wehave (

I + tN)2 = I+ tN, so the above polynomial vanishes in a neighborhood of 0, which

implies that it is identically zero. Therefore, (

I+ N)2 = I+ N, as required.

Next, consider the map

t

(I + tN)2 (I+ tN), t R.

It is a polynomial since Nr = 0. Furthermore, for t sufficiently small, tN < 1 and we have(I + tN)2 = I + tN, so we conclude as above that the above map is identically zero and

that (I+ N)2 = I+ N.

Remark: Proposition 3.2 can be easily generalized to the map A Ap, for any p 2, byusing the power series

(I+ A)1

p = I+1

pA + + 1

n!

1

p

1

p 1

1

p n + 1

An + .

33


34/41

Using proposition 3.2, we can find a square root for the unipotent part of a Jordan block,

Jrk(k) = kI

I+

H

k

.

If Nk = Hk , then I+ Nk = I +

rk1j=1

(1)j1 (2j)!(2j 1)(j!)222j N

jk

is a square root ofI+ Nk. Therefore, we obtained the following theorem:

Theorem 3.3 Every (complex) invertible matrix, A, has a square root.

Remark: Theorem 3.3 can be easily generalized to pth roots, for any p 2,

We now consider the problem of finding a real square root of an invertible real matrix.It turns out that the necessary and sufficient condition is exactly the condition for finding areal logarithm of a real matrix.

Theorem 3.4 LetA be a real invertiblenn matrix and let (X 1)r1 , . . . , (Xm)m1 beits list of elementary divisors or, equivalently, let Jr1(1), . . ., Jrm(m) be its list of Jordanblocks. Then, A has a real square root iff for every ri and every real eigenvalue i < 0, thenumber, mi, of Jordan blocks identical to Jri(i) is even.

Proof. The proof is very similar to the proof of Theorem 2.4 so we only point out thenecessary changes. Let J be a real Jordan matrix so that

A = P JP1,

where J satisfies conditions (1) and (2) of Theorem 1.10. As A is invertible, every blockof J of the form Jrk(k) corresponds to a real eigenvalue with k > 0 and we can writeJrk(j) = kI(I+ Nk), where Nk is nilpotent. As in Theorem 3.3, we can find a real squareroot, Mk, of I + Nk and as k > 0, the diagonal matrix kI has the real square root

Sk =

k 0 00

k 0

......

. . ....

0 0 k

.

Set Yk = SkMk.

The other real Jordan blocks of J are of the form J2rk(k, k), with k, k R, not bothzero. Consequently, we can write

J2rk(k, k) = Dk(I+ Nk)

34


35/41

where

Dk =

L(k, k) 0...

. . ....

0 L(k, k)

with

L(k, k) =

k kk k

,

and Nk = D1k Hk is nilpotent. We can find a square root, Mk, of I+ Nk as in Theorem 3.3.

If we write k + ik = keik , then

L(k, k) = k


.

Then, if we set

S(k, k) = kcos k2 sin k2 sin k2 cos k2 ,a real matrix, we have

L(k, k) = S(k, k)2.

If we form the real block diagonal matrix,

Sk =

S(k, k) 0... . . . ...0 S(k, k)

,we have Dk = S

2k and then the matrix Yk = SkMk is a square root of J2rk(k, k). Finally,

if Y is the block diagonal matrix diag(Y1, . . . , Y m), then X = P Y P1 is a square root of A.

Let us now prove that if A has a real square root, X, then A satisfies the condition ofTheorem 3.4. Since X is a real matrix, we know from the proof of Theorem 1.9 that theJordan blocks of X associated with complex eigenvalues occur in conjugate pairs, so theyare of the form

Jrk(k), k R,Jrk(k) and Jrk(k), k = k + ik, k = 0.

By Theorem 3.1, the Jordan blocks ofA = X2 are obtained by replacing each k

by 2k

, thatis, they are of the form

Jrk(2k), k R,

Jrk(2k) and Jrk(

2k), k = k + ik, k = 0.

35


36/41

If k R, then 2k > 0, so the negative eigenvalues of A must be of the form 2k or 2k, withk complex. This implies that k =

k e

i2 , but then k =

k e

i2 and so

2k = 2k.

Consequently, negative eigenvalues ofA are associated with Jordan blocks that occur in pair,as claimed.

Remark: Theorem 3.4 can be easily generalized to pth roots, for any p 2,Theorem 3.4 appears in Higham [16] as Theorem 5 but no explicit proof is given. Instead,

Higham states: The proof is a straightfoward modification of Theorem 1 in Culver [11]and is omitted. Culvers proof uses results from Gantmacher [14] and does not provide aconstructive method for obtaining a square root. We gave a more constructive proof (butperhaps longer).

Corollary 3.5 For every real invertible matrix, A, if A has no negative eingenvalues, thenA has a real square root.

We will now provide a sufficient condition for the uniqueness of a real square root. Forthis, we consider the open set, H(n), consisting of all real n n matrices whose eigenvalues, = + i, have a positive real part, > 0. We express this condition as () > 0.Obviously, such matrices are invertible and cant have negative eigenvalues. We need aversion of Proposition 2.8 for semisimple matrices in H(n).

Remark: To deal with pth roots, we consider matrices whose eigenvalues, ei, satisfy the

condition p < < p .

Proposition 3.6 For any two real or complex matrices, S1 and S2, if the eigenvalues, ei,

of S1 and S2 satisfy the condition 2 < 2 , if S1 and S2 are semisimple and if S21 = S22,then S1 = S2.

Proof. The proof is very similar to that of Proposition 2.8 so we only indicate where modifi-cations are needed. We use the fact that ifu is an eigenvector of a linear map, A, associatedwith some eigenvalue, , then u is an eigenvector of A2 associated with the eigenvalue 2.We replace every occurrence of ei by 2i (and e

by 2). As in the proof of Proposition 2.8,we obtain the equation

12u1 + + k2uk = 121u1 + + k2uk.

Therefore, we deduce thatk

2 = k2k, 1 k n.

36


37/41

Consequently, as , k = 0, if k = 0, then

2 = 2k,

which implies =

k. However, the hypothesis on the eigenvalues of S1 and S2 implies

that = k. The end of the proof is identical to that of Proposition 2.8.

Obviously, Proposition 3.6 holds for real semisimple matrices, S1, S2, in H(n).

Remark: Proposition 3.6 also holds for the map S Sp, for any p 2, under the conditionp

< p

.

We have the following analog of Theorem 2.9, but we content ourselves with a weakerresult:

Theorem 3.7 The restriction of the squaring map, A A2, to H(n) is injective.

Proof. Let X1, X2 H(n) and assume that X21 = X22 . As X1 and X2 are invertible, byProposition 2.6, we can write X1 = S1(I + N1) and X2 = S2(I + N2), where S1, S2 aresemisimple, N1, N2 are nilpotent, S1(I+ N1) = (I+ N1)S1 and S2(I+ N2) = (I+ N2)S2. AsX21 = X

22 , we get

S21(I + N1)2 = S22(I + N2)

2.

Now, as S1 and S2 are semisimple and invertible, S21 and S

22 are semisimple and invertible,

and as N1 and N2 are nilpotent, 2N1 + N21 and 2N2 + N

22 are nilpotent, so (I + N1)

2 and(I + N2)

2 are unipotent. Moreover, S1(I + N1) = (I+ N1)S1 and S2(I + N2) = (I + N2)S2imply that S21(I + N1)

2 = (I + N1)2S21 and S

22(I + N2)

2 = (I + N2)2S22 . Therefore, by the

uniqueness statement of Proposition 2.6, we get

S21 = S22 and (I+ N1)

2 = (I + N2)2.

However, as X1, X2 H(n) we have S1, S2 H(n) and Proposition 3.6 implies that S1 = S2.Since I + N1 and I + N2 are unipotent, proposition 3.2 implies that N1 = N2. Therefore,X1 = X2, as required.

Remark: Theorem 3.7 also holds for the restriction of the squaring map to real or complexmatrices, X, whose eigenvalues, ei, satisfy the condition 2 < 2 . This result is provedin DePrima and Johnson [12] by a different method. However, DePrima and Johnson need

an extra condition, see the discussion at the end of this section.We can now prove the analog of Theorem 2.11 for square roots.

Theorem 3.8 If A is any real invertible n n matrix and A has no negative eigenvalues,then A has a unique real square root, X, with X H(n).

37


38/41

Proof. If we go back to the proof of Theorem 3.4, we see that complex eigenvalues of thesquare root, X, produced by that proof only occur for matrices

S(k, k) =

kcos

k2

sin

k2

sin k2 cos k2 ,

associated with eigenvalues k + ik = k eik . However, the eigenvalues of such matrices are

k ei

k2 and since A has no negative eigenvalues, we may assume that < k < , and

so 2 < k2 < 2 , wich means that X H(n), as desired. By Theorem 3.7, such a squareroot is unique.

Theorem 3.8 is stated in a number of papers including Bini, Higham and Meini [6], Cheng,Higham, Kenney and Laub [10] and Kenney and Laub [22]. Theorem 3.8 also appears inHigham [17] as Theorem 1.29. Its proof relies on Theorem 1.26 and Theorem 1.18 (both inHighams book), whose proof is not given in full (closer examination reveals that Theorem1.36 (in Highams book) is needed to prove Theorem 1.26). Although Highams Theorem

1.26 implies our Theorem 3.7 we feel that the proof of Theorem 3.7 is of independent interestand is more direct.

As we already said in Section 2, Kenney and Laub [22] state Theorem 3.8 as Lemma A1in Appendix A. The proof is sketched briefly. Existence follows from the Cauchy integralformula for operators, a method used by DePrima and Johnson [12] in which a similar resultis proved for complex matrices (Section 4, Lemma 1). Uniqueness is proved in DePrima andJohnson [12] but it uses an extra condition. The hypotheses of Lemma 1 in DePrima andJohnson are that A and X are complex invertible matrices and that X satisfies the conditions

(i) X2 = A,

(ii) the eigenvalues, ei, of X satisfy 2 < 2 ,(iii) For any matrix, S, if AS = SA, then XS = SX.

Observe that condition (ii) allows = 2 , which yields matrices, A = X2, with negative

eigenvalues. In this case, A may not have any real square root but DePrima and Johnson areonly concerned with complex matrices and a complex square root always exists. To guaranteethe existence of real logarithms, Kenney and Laub tighten condition (ii) to

2< <

2.

They also assert that condition (iii) follows from conditions (i) and (ii). This can be shown asfollows: First, recall that we have shown that uniqueness follows from (i) and (ii). Uniquenessunder conditions (i) and (ii) can also be shown to be a consequence of Theorem 2 in Higham

[16]. Now, assume X2 = A and SA = SA. We may assume that S is invertible since the setof invertible matrices is dense in the set of all matrices. Then, as SA = AS, we have

(SX S1)2 = SX2S1 = SAS1 = A.

Thus, SX S1 is a square root ofA. Furthermore, X and SX S1 have the same eigenvaluesso SX S1 satisfies (i) and (ii) and, by uniqueness, X = SX S1, that is, XS = SX.

38


39/41

Since Kenney and Laub only provide a sketch of Theorem A1 and since Higham [ 17] doesnot give all the details of the proof either, we felt that the reader would appreciate seeing acomplete proof of Theorem 3.8.

4 Conclusion

It is interesting that Theorem 2.11 and Theorem 3.8 are the basis for numerical methods forcomputing the exponential or the logarithm of a matrix. The key point is that the followingidentities hold:

eA = (eA/2k

)2k

and log(A) = 2k log(A1/2k

),

where in the second case, A1/2k

is the unique kth square root of A whose eigenvalues, ei,lie in the sector

2k< <

2k. The first identity is trivial and the second one can be shown

by induction from the identitylog(A) = 2 log(A1/2),

where A1/2 is the unique square root of A whose eigenvalues, ei, lie in the sector

2

logarithms square roots and real matrices

Documents