an introduction to quaternions with application to rotations · an introduction to quaternions with...

AN INTRODUCTION TO QUATERNIONS WITH APPLICATION

TO ROTATIONS

LARRY SUSANKA

Contents

1. Algebras 22. (*) The Frobenius Theorem 43. (*) Hurwitz’s Theorem 44. Isometries and Linearity in Rn 45. Isometries and Reflections in Rn 66. (*) Complex Numbers: First Steps 107. Complex Power Series 128. (*) The Geometry of Complex Numbers 149. Matrix Form of Complex Numbers 1510. (*) Linear Rigid Motions in Space 1611. (*) Quaternions 1812. (*) Solutions of z2 = −1 2113. (*) Special Products of Quaternions: Rotations in R3 2214. Power Series and Derivatives for Quaternions 2515. Quaternions as Matrices 2816. The Pauli Spin Matrices 3017. Rotations in R4 3218. Quaternions and Isometries on R4 3219. (*) Another Look at Rotations in R3 3620. (*) Recap: Isometries in Rn, for n = 1, 2, 3 and 4 3721. Octonions 3822. The Hopf Fibration 3923. (*) Matrices and Quaternions: a Comparison 4124. (*) Quaternion Interpolation 4225. (*) The “Belt Trick” 44Index 46

Date: August 29, 2019.

1

2 LARRY SUSANKA

1. Algebras

A (real) algebra1 is a vector space V whose scalars are the real numbers Rtogether with an additional operation on pairs of vectors, usually indicated as mul-tiplication, with a few properties.

First, the product vw of the vectors v and w from V must produce a vector inV . This property is called closure, and applies to subsets too. If S ⊂ V we saythe product is closed in S if vw is in S whenever both v and w are in S.

To avoid a few unimportant special cases we assume the dimension of V tobe at least one and that the multiplication operation is nontrivial: that is,vw 6= 0 for at least one pair v, w of vectors in V .

Unless the vector space is a one dimensional space, essentially R itself, dot prod-uct on Rn produces a number, not a vector, so the familiar dot product is not anexample of what we mean here except in that case.

The usual cross product on vectors in R3 is an example. The usual matrixmultiplication on the vector space of n-by-n matrices is another.

The two distributive laws2 must hold for this operation:

(v + w)z = vw + vz and z(v + w) = zv + zw

for any vectors v, w and z.

We adopt the usual practice of giving multiplication priority over addition, sozv + zw = (zv) + (zw) and is not any of the other possible combinations, such as(z(v + z))w.

Both cross product and matrix multiplication satisfy these distributive laws.

Finally, if r and s are scalars (i.e. real numbers) and v and w are vectors then

(rs)(vw) = ((rs)v)w = (rv)(sw) = v((rs)w).

Cross product and matrix multiplication satisfy this one too.

Taking a look at the equation above we see three different multiplications, allindicated by “juxtaposition,” i.e. putting the symbols to be multiplied next to eachother. rs is the product of two real numbers. vw is the product of two vectors. rv isscalar multiplication of real number r by vector v. Believe it or not, in applicationsthis rarely causes confusion.

An algebra is called associative if, for any choice of vectors v, w and z

(vw)z = v(wz).

1Those who wish to minimize abstraction but still learn the basic facts can stick to the sectionsmarked with (∗) and skip the rest without too much much loss of continuity. Readers withoutmuch background in linear algebra, on the other hand, will likely find mention of vector spaces,

linear transformation, basis, dimension, homomorphism and isomorphism to be puzzling, but canstill understand many of the ideas in these notes by simply ignoring any sentence containing thewords.

2This implies that if ~0 denotes the zero vector and v is any vector then ~0v = (~0+~0)v = ~0v+~0v

so ~0v = ~0.

AN INTRODUCTION TO QUATERNIONS WITH APPLICATION TO ROTATIONS 3

This property is important because without it the product of two or more terms,such as vwzq, is ambiguous. One must then include a lot of parentheses to dictateorder. Matrix multiplication is associative, but cross product is not.

An algebra is called commutative if vw = wv for any pair of vectors v, w.Matrix multiplication of n-by-n matrices is not commutative, and neither is crossproduct. However the last operation satisfies v×w = −w×v, a property sometimescalled “anticommutativity.”

An algebra is called unital if there is a vector, sometimes denoted e, for which

ev = ve = v for all vectors v.

This is not to be confused with the scalar, number 1. The multiplicative identity eis a vector, not a scalar, and is unique if it exists.

Matrix multiplication is unital, but cross product is not.

An algebra is called a division algebra if for every nonzero vector v and anyvector w there is exactly one vector z and exactly one vector y for which w = zvand w = vy.

Division algebras have no zero divisors3: these are nonzero elements h and gwith hg = 0.

An associative division algebra is unital.4

It then follows immediately that for every vector v there is a (necessarily unique)vector, usually denoted v−1, for which vv−1 = v−1v = e.

An algebra homomorphism from algebra V to algebra W is a linear transforma-tion T from V to W that “preserves” products. Specifically,

T (vw) = T (v)T (w) for all v, w in V.

An algebra isomorphism is an algebra homomorphism with an inverse function.It is not hard to show that this inverse function must be linear also, and an algebrahomomorphism. So the inverse function is an algebra isomorphism too, from W toV .

Two algebras are called algebra isomorphic if there is an algebra isomorphismbetween them. And algebra-isomorphic algebras are thought of as identical in allimportant respects: they are not really different algebras at all, but merely twomanifestations of the same structure.

The Frobenius’ theorem, proved in 1877, asserts that there are exactly threefinite dimensional associative real division algebras.

3Suppose for nonzero g we have hg = 0. Find unique vector A for which Ag = g and theng = hg +Ag = (h+A)g so by uniqueness h+A = A so h = 0.

4Suppose x = axx = xbx. (We repeatedly use the assumed uniqueness of coefficients such as

ax and bx in the following discussion.) Then x2 = (axx)(axx) = (axxax)x so (x − axxax)x = 0so x = axxax = xax so ax = bx: in other words, the left identity for x is also the right identityfor x, which we henceforth denote ex. Note also that exx = exexx so e2x = ex for all x: that is tosay, these identities are idempotent. Now suppose x and y are nonzero vectors and x = exx and

y = eyy. Then there is a unique H so that ex = Hey . But then exy = Heyy = Hy so H = exand we find that exex = ex = Hey = exey and so ex = ey . In other words, there is a unique

two-sided identity in V .

4 LARRY SUSANKA

The first is the real numbers, the one dimensional example. The second isthe complex numbers, a two dimensional algebra, about which we will meditatenext. The third is a four dimensional algebra, the main subject of these notes, thequaternions.

This theorem asserts, more specifically, that any finite dimensional associativereal division algebras is algebra isomorphic to one of these three.

An algebra norm on a real algebra V is a norm on V that satisfies a conditioninvolving the product. Specifically, a non-negative real valued function ‖ · ‖ definedon V is an algebra norm if, for every real number r and any vectors v and w

‖ r v ‖ = |r| ‖v‖‖ v + w ‖ ≤ ‖v‖+ ‖w‖ (the triangle inequality)

‖v‖ = 0 if and only if v = 0 and

‖ v w ‖ = ‖v‖ ‖w‖.

Hurwitz’s theorem, proved in 1898, takes a slightly different tack from theFrobenius’ theorem. It states that any unital real division algebra with an algebranorm is algebra isomorphic to one of the three algebras listed above or to an eightdimensional algebra called either the octonions or, in some older sources, the Cay-ley numbers. The octonions are not associative, and we will make only the mostsuperficial exploration of their properties in these notes.

These two theorems both refer to finite dimensional algebras with multiplicativeidentity and both pin down the possible division algebras to a short list. The Frobe-nius theorem assumes the algebra to be associative but no norm is involved whileHurwitz’s theorem does not assume associativity (so the existence of an identitymust be an additional hypothesis here) but does use the existence of an algebranorm.

2. (*) The Frobenius Theorem

3. (*) Hurwitz’s Theorem

4. Isometries and Linearity in Rn

An isometry on Rn is a function from Rn to itself that preserves the Euclideanlength on Rn. Specifically, H is an isometry exactly when

‖H(u)−H(v)‖ = ‖u− v‖ for all u, v in Rn

where the norm indicated here is defined on vector v = (v1, v2, . . . , vn) by

‖ (v1, v2, . . . , vn) ‖ =√v21 + · · ·+ v2n.


So an isometry H is uniformly continuous in the sense that for all ε > 0 and forevery pair of vectors u and v you can guarantee ‖H(v) −H(u)‖ < ε by requiringthat ‖v − u‖ < ε.

We will be interested particularly in isometries for which H(0) = 0, and willrequire this of the isometries we consider, in which case (set v = 0) we have

‖H(u)‖ = ‖u‖ for all vectors u.

For any u, an application of the triangle inequality gives

2‖u‖ = ‖u− (−u)‖ = ‖H(u)−H(−u)‖ ≤ ‖H(u)‖+ ‖H(−u)‖ = 2 ‖u‖so equality holds above, which can only happen if H(u) is a positive multiple of−H(−u). Since their norms are equal, this means

−H(u) = H(−u) for all u.

Suppose n is a positive integer and H(nu) = nH(u). Then

(n+ 1)‖u‖ =‖H((n+ 1)u)‖ = ‖H((n+ 1)u)−H(nu) +H(nu)‖≤ ‖H((n+ 1)u)−H(nu)‖+ ‖H(nu)‖ = ‖(n+ 1)u− nu‖+ ‖H(nu)‖= ‖u‖+ ‖nu‖ = (n+ 1)‖u‖.

Again, we see equality holds in this application of the triangle inequality, so H(nu)is a positive multiple of H((n + 1)u) − H(nu) and in view of their norms, thatmultiple is n. So

n(H((n+ 1)u)−H(nu) ) = H(nu)

which implies nH((n + 1)u) = (n + 1)H(nu). Since by our initial assumptionnH(u) = H(nu) we find H((n+ 1)u) = (n+ 1)H(u).

Since the assumed condition is obviously true for n = 1 an appeal to inductionallows us to deduce that the condition we assumed above is true for any u andany positive integer n. Because H(0) = 0 and H(−u) = −H(u), the conditionnH(u) = H(nu) holds for any u and any integer n.

With this in hand, if n is a nonzero integer and p any integer

H( pnu)

= pH

(1

nu

)=p

nnH

(1

nu

)=p

nH

(n

1

nu

)=p

nH (u) .

Any real number c is the limit of a sequence of rational numbers rn for integersn > 1 and so for vector u the vector cu is the limit of the vector sequence

limn→∞

rnu = cu and also limn→∞

rnH(u) = cH(u).

Since H is continuous at cu, for any real number c and any vector u

cH(u) = limn→∞

rnH(u) = limn→∞

H(rnu) = H(cu).

Finally, if u and v are vectors

2‖u− v‖ = ‖2u− 2v‖ = ‖H(2u)−H(2v)‖= ‖H(2u)−H(u+ v) +H(u+ v)−H(2v)‖≤ ‖H(2u)−H(u+ v)‖+ ‖H(u+ v)−H(2v)‖= ‖2u− u− v‖+ ‖u+ v − 2v‖ = 2‖u− v‖

6 LARRY SUSANKA

so, as above, we have equality throughout which means

H(2u)−H(u+ v) = H(u+ v)−H(2v)

so H(2u) +H(2v) = 2H(u+ v). Since H(2u) = 2H(u) and H(2v) = 2H(v) we find

H(u) +H(v) = H(u+ v) for all vectors u, v.

Our conclusion is that H is a linear function. And since H(v) = 0 only whenv = 0 this function has trivial kernel. Since Rn is finite dimensional, this means His onto Rn, and so an isomorphism.

Isometries on Rn which fix the origin are linear isomorphisms.

The main result (linearity) is a special case of a more general result called theMazur-Ulam theorem which tells us that any function from one real normedvector space onto another (finite dimensional or not) which sends the origin to theorigin and which preserves distances, as our H did above, must be linear.5

5. Isometries and Reflections in Rn

If u is a vector in Rn define u⊥ to be the subspace

u⊥ = { v ∈ Rn | u · v = 0 }.This is the collection of all vectors perpendicular to u. If u is nonzero, u⊥ hasdimension n − 1. If n = 1 then u⊥ = {0}. If n = 2 then u⊥ is the line of vectorsthrough the origin perpendicular to u. If n = 3 then u⊥ is a plane and if n ≥ 4 itis called a hyperplane.

Now define for nonzero u the linear functions Proju⊥ and Proju by

Proju(v) =v · uu · u

u and Proju⊥(v) = v − Proju(v).

These functions are called projections onto the line through u and onto the sub-space u⊥, respectively.

The identity v = Proju⊥(v)+Proju(v) decomposes v into the sum of two vectors,one a multiple of u and the other perpendicular to u.

The function Ru given by

Ru(v) = Proju⊥(v)− Proju(v) = v − 2Proju(v)

is called the reflection in u⊥. All vectors in u⊥ are left alone by Ru, while anymultiple cu of u is sent to −cu.

Ru(v) ·Ru(v) = (Proju⊥(v)− Proju(v)) · (Proju⊥(v)− Proju(v))

= (Proju⊥(v)) · (Proju⊥(v)) + (Proju(v)) · (Proju(v))

= (Proju⊥(v) + Proju(v)) · (Proju⊥(v) + Proju(v)) = v · vand the linearity of Ru then implies that it is an isometry.

5A. Vogt, Maps which preserve equality of distance, Studia Math. 45 (1973) 43-48.


The norm of u = (u1, . . . , un) is irrelevant in the formation of Ru, so let’s replaceu with u/‖u‖ so we can work with unit vectors. In that case, if ei is the unit vectorin the direction of the ith coordinate axis, and if In is the n × n identity matrix,the matrix M of Ru (vectors thought of as columns, matrix on the left) has ithcolumn Ru(ei) = ei − 2uiu. So

M = In − 2uuT

where uT is the row matrix that is the transpose of column u.

Since Ru is an isometry the columns of M form an orthonormal basis for Rn andM−1 = MT : that is, M is an orthogonal matrix.

M acts as the identity on the subspace u⊥. Select p1 = u and let p2, . . . , pn bean orthonormal basis of u⊥. Then

p1, p2, . . . , pn

constitutes an orthonormal basis of Rn. Let P be the matrix whose columns arethe members of this basis, in order. So

M p1 = −p1 and M pi = pi for i = 2, . . . , n

and therefore

PT M P =

−1 0 0 . . . 00 1 0 . . . 0...

......

...0 0 0 . . . 1

Note that the determinant of this matrix, and so the determinant of M , is −1.The columns of M form a left-handed rather than a right-handed orthonormalbasis. (These two groups of bases are determined by the determinant condition,and except in dimension three the visual metaphor associated with “handedness”is not very useful.)

We have discovered that any reflection matrix can be brought to the form on theright above by an orthogonal matrix of transition P with the normalized vector uin the far left column and the remaining columns spanning u⊥. On the other hand,any matrix M that can be brought to this form by orthogonal P corresponds tothe reflection in p⊥1 given explicitly by

Rp1(v) = (−p1 · v) p1 + ( (p2 · v) p2 + · · ·+ (pn · v) pn ) .

We will now discuss a result often referred to as (a special case of) the Cartan-Dieudonne theorem.

The theorem we will prove states that every (nonidentity) linear isometry of Rncan be formed as a composition of at most n reflections of the type we consideredabove.

This result is obviously true when n = 1, because in that case the only nontriviallinear isometry is the function H(x) = −x which is R1(x) for any real x.

In dimension 2 the situation is a bit different. There, the matrix of a linearisometry is of the form

M =

(a1 b1a2 b2

)where a21 + a22 = 1 = b21 + b22 and a1b1 = −a2b2.

8 LARRY SUSANKA

This corresponds to(a ∓

√1− a2

±√

1− a2 a

)or

(a ±

√1− a2

±√

1− a2 −a

)with − 1 ≤ a ≤ 1

and where the signs are chosen so that the columns are orthogonal. Matrices of thefirst type have determinant 1, while those of the second type have determinant −1.

There are unique θ and µ with

0 ≤ θ < 2π and 0 ≤ µ < 2π

so that matrices of the first and second types can be represented, respectively, as(cos(θ) − sin(θ)sin(θ) cos(θ)

)or

(cos(µ) sin(µ)sin(µ) − cos(µ)

).

Matrices of the first kind correspond to rotations, counterclockwise by angle θ.Matrices of the second kind correspond to reflections across the line through theorigin at angle µ/2, as can most easily be checked by calculating that(

cos(µ) sin(µ)sin(µ) − cos(µ)

)(cos(µ2

)sin(µ2

)) =

(cos(µ2

)sin(µ2

))and

(cos(µ) sin(µ)sin(µ) − cos(µ)

)(− sin

(µ2

)cos(µ2

) ) =

(sin(µ2

)− cos

(µ2

))There are a number of relationships among these types of orthogonal matrices,

but just one that concerns us now.(cos(α) sin(α)sin(α) − cos(α)

)(cos(β) sin(β)sin(β) − cos(β)

)=

(cos(α) cos(β) + sin(α) sin(β) cos(α) sin(β)− sin(α) cos(β)sin(α) cos(β)− cos(α) sin(β) sin(α) sin(β) + cos(α) cos(β)

)=

(cos(α− β) − sin(α− β)sin(α− β) cos(α− β)

).

So products of two reflection matrices can produce any rotation matrix, in manyways. The case of β = 0 provides a particularly easy route to finding such a productfor a given rotation angle α: first reflect across the x axis, then across the line atangle α/2.

Our conclusion is that in R2 any isometry is either a reflection or the product oftwo reflections.

Let us suppose we have shown for some k and all n with 0 < n ≤ k that anyisometry in dimension n is the composition of at most n reflections. The case k = 2corresponds to the calculations we have just completed.

Suppose further that we are in possession of a linear isometry H on Rk+1. Thematrix M of this isometry is orthogonal and therefore normal (it commutes withits transpose) and real so it can be reduced using orthogonal matrix of transitionP to block diagonal form

PTMP =

(A OO B

)


where A is either 1 × 1 (i.e. a number) which must be ±1 or it is 2 × 2 in whichcase it has a conjugate pair of complex eigenvalues of norm 1 and is of the form,for some angle θ,

A =

(cos(θ) − sin(θ)sin(θ) cos(θ)

).

By assumption the matrix B is the product of no more than j reflection matriceswhere j is the width of B, either k or k − 1.

We consider three cases.

First, if A is the number 1 then PTMP is the product of no more than kreflections of the form (

1 OO R

)where R is one of the reflection matrices whose product is B.

Second, if A is the number −1 then PTMP is the product of no more than kreflections of the form just described plus the reflection(

−1 OO Ik

)where Ik is the k×k identity matrix, for a total of at most k+ 1 reflection matricesneeded to form PTMP .

Finally, if A is 2 × 2 we have calculated that A can be written as the productof 2 reflection matrices and we assume that the (k − 1) × (k − 1) matrix B canbe written as the product of at most k − 1 reflection matrices. We conclude thatPTMP itself can be written as a product of no more than k+ 1 reflection matricesof the form (

I2 OO R

)and

(S OO Ik−1

)where S is a 2× 2 reflection matrix used to create A.

If R1R2 · · ·Rm is a product of reflections equal to PTMP then

PR1R2 · · ·RmPT =(PR1P

T) (PR2P

T)· · ·(PRmP

T)

is a product of reflections equal to M . We have just shown that at most k + 1reflection matrices are required for our matrix M , and invoke induction to concludethat this is true not just for this k but for any k, and the theorem is proved.

As pointed out by Conway and Smith6 there is a little more that can be squeezedout of this argument. If a linear isometry fixes not just the origin, but acts as theidentity on any m dimensional subspace of Rn for 0 ≤ m < n then it can be formedas the composition of at most n−m reflections.

Any isometry on Rn which acts as the identity on any m dimensionalsubspace of Rn for some m with 0 ≤ m < n can be realized as the

composition of at most n−m reflections.

6John Conway and Derek Smith: On Quaternions and Octonions. A. K. Peters, Ltd. 2003,p. 6. This is a superb and very readable little book, recommended for anyone who wants to delve

a little deeper into the properties of these division algebras.

10 LARRY SUSANKA

We note also that if the isometry has determinant 1 there must be an evennumber of reflections involved in the product. If the determinant is −1 there arean odd number of reflections involved.

6. (*) Complex Numbers: First Steps

Quaternions are sometimes referred to as “hypercomplex numbers” and thestory of quaternions, their geometry and applications and properties, is tied upwith the story of complex numbers so we will start there.

Complex numbers are usually introduced to algebra students, around the timethey learn about the quadratic formula, as “expressions of the form”

a+ b i

where a and b are real numbers and i is a mysterious symbol having the property

i2 = −1

and which is otherwise handled formally just like one would deal with a variable ina polynomial expression.

The symbol C is used to denote the set of complex numbers.

The real numbers a and b are called the real and imaginary parts of thecomplex number, respectively. If a is 0 the number is called imaginary while if bis 0 the complex number is called real.

In attempts to solve polynomial equations, expressions involving quantities withnegative squares appeared in European mathematics in the mid-sixteenth century(Cardano and Bombelli) but were widely regarded as serving only intermediatepurpose, en route to true solutions which had to be (positive, at first) real numbers.

More than 200 years later Euler, Argand and (ultimately) Gauss developed thelink between planar geometry and complex numbers in more or less its presentform, and that is the story of the next sections. They learned that i, thoughnot a real number, corresponds to something very real. It can represent a 90◦

counterclockwise rotation in the plane, and the equation i2 = −1 means that (1, 0)is rotated onto (−1, 0) by two such rotations.

We will also use the vocabulary of vector spaces and linear functions, alreadyencountered in Section 1, which are a more recent invention, the definitions of whichwere organized and clarified by Peano in 1888.

The first, algebraic, definition of complex numbers is as the collection of symbolsof the form given above together with two operations, addition and multiplication,codified by the formulae

(a+ b i) + (c+ d i) = (a+ b) + (c+ d) i

and

(a+ b i)(c+ d i) = (ac− bd) + (ad+ bc) i


for real a, b, c and d.

It is hardly obvious by cursory inspection that various common properties of theordinary operations on numbers, such as the distributive laws, associativity andcommutativity, actually hold for these operations. Are there multiplicative andadditive identities? Are there multiplicative and additive inverses?

It is a calculation to show that all the common laws do in fact hold and that0 = 0 + 0 i is the additive identity and 1 = 1 + 0 i is the multiplicative identity. Thenumber a+ b i has additive inverse −a+ (−b) i.

The complex conjugate of a + b i is denoted a + b i and defined to be a +(−b) i = a− b i. One shows that for complex numbers z and w that

z + w = z + w and zw = (z) (w).

Note that (a+ b i)(a+ b i) = a2 + b2 so unless both a and b are 0 we find

1 = (a+ b i)

(a

a2 + b2+

−ba2 + b2

i

)= (a+ b i)

(a+ b i

(a+ b i)(a+ b i

) ) .The calculation shows that nonzero complex number z = a+b i has multiplicative

inverse given by

z−1 =z

z z.

Possessing the requisite properties, C is a two dimensional commutative associa-tive real division algebra.

The norm of a+ b i is denoted ‖a + b i‖ and defined to be

‖a+ b i‖ =√a2 + b2 =

√z z.

If r is real it is obvious that ‖rz‖ = |r| ‖z‖, but even more is true. Actually,

‖zw‖ = ‖z‖‖w‖ for any complex numbers z and w.

Also

‖z + w‖ ≤ ‖z‖+ ‖w‖ (the triangle inequality)

and ‖z‖ ≥ 0 with equality when and only when z = 0.

Collectively, these contain the required qualities of an algebra norm on an alge-bra, in this particular case a norm on C.

Finally, we introduce what is known as the polar form of complex numbers. Acalculation shows that for any nonzero complex number z = a + b i, the complexnumber z/‖z‖ has norm 1. It’s coefficients are

a/√a2 + b2 and b/

√a2 + b2

and there is exactly one angle θ in (−π, π], sometimes denoted arg(z) and calledthe argument of z with

cos(θ) = a/√a2 + b2 and sin(θ) = b/

√a2 + b2.

12 LARRY SUSANKA

If we let r = ‖z‖ then

z = a+ b i = ‖z‖ z

‖z‖= r (cos(θ) + i sin(θ))

relating the “a+b i” form of z, also called the rectangular form, with this new formcalled the polar form which, of course, looks a lot like polar coordinates.

7. Complex Power Series

If the norm of a sequence of complex numbers converges to 0, both the real andimaginary parts of the complex number must converge to zero. The converse alsois true.

Properties of the norm bring concepts of convergence, introduced for real se-quences, into the complex universe. A complex sequence converges to a complexnumber if the norm of the sequence of differences converges to 0. This happensexactly when both real and imaginary parts of the sequence of differences convergeto 0.

Since a series is just a type of sequence, we also know when a series, and inparticular a power series, formed from complex numbers will converge.

In Calculus we show that the three series

1 + x+x2

2+x3

6+x4

24+

x5

120+

x6

720· · · =

∞∑n=0

xn

n!= ex

and

1− x2

2+x4

24− x6

720+ · · · =

∞∑n=0

x2n

(2n)!= cos(x)

and

x− x3

6+

x5

120− · · · =

∞∑n=0

x2n+1

(2n+ 1)!= sin(x)

all converge with infinite radius of convergence, and if you wish to proceed thatway could constitute the definition of the indicated functions.

A series of this type, which is the limit of a specific sequences of partial sums, canbe reorganized and broken into “sub-series” whose terms are listed in any order welike (so long as all terms are eventually included) and the limit will be unchanged.

Standard results on product series, the key to one proof that

exey = ex+y,

rely on the real norm: i.e. the absolute value.

When we use the complex norm to define convergence of complex series, judiciousapplication of the triangle inequality gives these same results for complex series.

Any real function with a power series expansion can be extended to a complexdomain, and the domain of convergence will contain all complex numbers with normless than the real radius of convergence.


Let me reiterate this amazing fact: most real functions we encounter have powerseries with, at least, some radius of convergence. All of these can be “complexified”to complex domain, producing complex output.

Euler’s equation, seen below, is an example of this in action.

ea+b i = eaeb i = ea (cos(b) + i sin(b)) .

Note that the right hand side is the polar form of any complex number that canbe written as z = ea+b i. That includes any complex number except 0.

The formula is proved by substituting b i into the power series for ex and al-gebraically simplifying the powers (b i)n. Gathering terms in which a factor of iremains and factoring out this common factor produces the series for sin(b). Theremaining terms are the series for cos(b).

If ea+b i = ec+d i for real a, b, c and d then a = c and b − d is a multiple of 2π.So the complex exponential has no global inverse function. When its domain isrestricted there can be inverses. In particular, when restricted to complex numbersa + b i where a is real and b is taken from an interval of the form (h, h + 2π)the exponential is one-to-one and therefore has an inverse on its range. Thatrange consists of all nonzero complex numbers whose polar form uses an angle notcoterminal with h.

The inverse on this particular piece of the exponential function is usually denotedwith some form of logarithm notation such as

log(z) = ln(‖z‖) + i θ

when z 6= 0 and θ is the unique angle, arg(z), in the interval (h, h+ 2π) for whichz = ‖z‖ (cos(θ) + i sin(θ)).

An important ambiguity arises here, caused by the fact that the exponential hasno global inverse. arg(z), of course, depends on which piece of the exponential weare using, and that must be specified. Each different h produces what is called abranch of the logarithm.

If it is not specifically mentioned, it is the custom to choose h = −π so that theexcluded complex numbers in the range of the exponential are those which lie alongthe negative real axis, and arg(z) is taken from (−π, π).

Given a particular branch of the logarithm, we can define for complex numbersp and q (with p not excluded from the domain of this branch)

pq = eq log(p).

Other important facts are

(cos(b) + i sin(b)) (cos(c) + i sin(c))

= eib eic = ei(b+c) = (cos(b+ c) + i sin(b+ c))

and De Moivre’s formula: for any integer n

(cos(b) + i sin(b))n

=(eib)n

= einb = (cos(nb) + i sin(nb)) .

14 LARRY SUSANKA

Functions of a complex variable can have derivatives, defined very similarly tothe derivative of an ordinary real function.

If f has complex domain

f ′(z) = lim‖h‖→0

f(z + h)− f(z)

hwherever the limit exists.

The derivative is the complex number to which the ratio is close, provided onlythat the norm of h is small.

The usual differentiation formulas hold and if a function is presented as a powerseries the derivative can be obtained by term-by-term differentiation on the interiorof the domain of convergence, just as with real power series, and the proof isessentially identical to the proof for real series.

The study of complex integrals and derivatives is beautiful and very useful topractical folk such as engineers, who spend quite a bit of time with these operations.

The existence of a complex derivative on and around a point in the complex planehas strong and surprising implications, whose explication lies beyond the boundaryof these notes.

8. (*) The Geometry of Complex Numbers

Given the basic facts about complex numbers we make a new description. Vectorsin the plane can be defined by two real numbers. So can complex numbers. So wecan make an association between vectors in the plane and complex numbers.

Complex numbers are algebraic objects, characterized by the unusual complexmultiplication. Vectors in the plane are all about geometry and angles and lengths.

The association will be useful if complex multiplication implies something inter-esting about geometry, or if geometry will help us understand complex multiplica-tion. In fact, both occur.

The most obvious association is the one we pick.

Identify complex number z = a+ b i with the vector 〈a, b〉.Note that norm in the plane, the pythagorean distance from tip to tail of a

vector, agrees with the norm of the associated complex number.

Recall that if the point (a, b) has polar coordinates (r, θ) then the vector 〈a, b〉can be written as

〈a, b〉 = rUθ

where Uθ = 〈cos(θ), sin(θ)〉 is the vector with its nose resting on the unit circle at

angle θ, pointing the way, and r =√a2 + b2 is a “stretch” (or “shrink”) factor.

Vectors are entirely concerned with norm and direction, and the rectangularcoordinates of a vector don’t display those qualities in the most convenient way.

This polar form of the vector does, and is extremely useful in applicationsof vectors. After all, we wouldn’t be using vectors if we weren’t obsessed withthese features, and in most applications that really use vectors we move back and


forth between rectangular form, convenient for combining vectors, and polar form,emphasizing their norm and direction.

〈a, b〉 ⇐⇒ rUθ = r 〈cos(θ), sin(θ)〉 = 〈r cos(θ), r sin(θ)〉 .

With our association between vectors in the plane and complex numbers, thecomplex number eiθ = cos(θ) + i sin(θ) is associated with unit vector Uθ and,generally,

a+ b i = r eiθ ⇐⇒ 〈a, b〉 = rUθ = 〈r cos(θ), r sin(θ)〉 .

We now can use complex multiplication to simplify our thinking about rotationsand dilations in the plane.

The calculations from above show that multiplying a complex number z by r eiθ

(where r is positive, θ real) has the effect of changing its norm by factor r androtating it by angle θ.

If z = s eiµ then (r eiθ)z = rs ei(θ+µ).

If we want to rotate a vector 〈a, b〉 (“tail”, of course, at the origin) by θ followedby a 15% stretch, calculate

1.15 eiθ (a+ b i) = 1.15 (cos(θ) + i sin(θ)) (a+ b i) .

The real part of this product is the first component of the stretched, rotated vectorand the imaginary part is the second component.

If we have several consecutive rotations we can combine them into a single effec-tive rotation by multiplying the complex numbers corresponding to the constituentrotations, and the result will be a single complex number corresponding to the finalrotation.

9. Matrix Form of Complex Numbers

Since rotations and dilations in the plane are linear transformations, and alllinear transformations on R2 are given by left matrix multiplication, this action ofcomplex numbers on vectors can be given by matrix multiplication.

The matrix associated with complex number a+ b i = r eiθ is

a+ b i ⇐⇒ r

(cos(θ) − sin(θ)sin(θ) cos(θ)

)=

(a −bb a

)= a

(1 00 1

)+ b

(0 −11 0

).

Matrices generally do not commute, but these matrices do commute with eachother, and a calculation shows that matrix multiplication corresponds to complexmultiplication. Note that the square of the second matrix is the negative of theidentity matrix, so the second matrix “acts like” i, while the first matrix “acts like”1.

And the squared norm of the associated complex number is the determinant ofthe matrix of that complex number.

16 LARRY SUSANKA

When the matrix has norm 1 (i.e. r = 1) this matrix is simply the matrix of arotation in the plane.

So everything about complex numbers can be expressed in terms of these matri-ces. Complex conjugation, for example, corresponds to taking the transpose.

This two dimensional real vector space of matrices with matrix multiplication isalgebra isomorphic to the complex numbers.

The multiple representations of these complex numbers, the original represen-tation, vectors and finally matrices with matrix operations, are all consistent. Forinstance(a −bb a

)(xy

)=

(ax− bybx+ ay

)⇔(ax− by −bx− aybx+ ay ax− by

)=

(a −bb a

)(x −yy x

)and, of course

(a+ b i)(x+ y i) = (ax− by) + (bx+ ay) i⇐⇒(ax− by −bx− aybx+ ay ax− by

).

To recapitulate, vectors in the plane, this set of matrices and the complex num-bers as originally presented are each two dimensional real vector spaces. The com-mutative, associative operation provided by matrix multiplication (i.e. complexmultiplication) has multiplicative identity and every nonzero element has a multi-plicative inverse. The distributive properties hold. The given association betweenthe three forms reveal them to be algebra isomorphic.

And there is a norm, giving us an idea of closeness so ideas from calculus can beextended to complex functions if we want.

10. (*) Linear Rigid Motions in Space

An ordered basis ~u1, ~u2, ~u3 of R3 is called right-handed if the determinant

det (~u1 ~u2 ~u3) > 0,

where the coordinates of the basis vectors in terms of the standard basis are givenas columns. It is called left-handed if that determinant is negative.

We define here a linear rigid motion L on R3 to be a linear map that sendsany (and hence every) right-handed ordered orthonormal basis, sometimes calledan orthonormal frame, to another. In particular it must send the standard basisvectors

~i = 〈1, 0, 0〉 , ~j = 〈0, 1, 0〉 , ~k = 〈0, 0, 1〉to an orthonormal frame ~u1, ~u2, ~u3 and the matrix M = (~u1 ~u2 ~u3) must havedeterminant one (it is the volume of a unit cube.)

A linear map of this kind must be an isometry; that is, it cannot change thelength of any vector. So any real eigenvalue must have value ±1 and any complexeigenvalues have norm 1 and come as a conjugate pair.

M is the matrix of this linear rigid motion: that is,

L(v) = Mv for every vector v in R3


and M has characteristic polynomial

λ3 + aλ2 + bλ− det M = λ3 + aλ2 + bλ− 1

for certain real numbers a and b.

Since the characteristic polynomial has third degree, this matrix has at least onereal eigenvalue.

As a first case, if our real eigenvalue is −1 then λ+ 1 divides the characteristicpolynomial, leaving quotient of the form

λ2 + cλ− 1

which has two real roots, one positive and one negative. Our conclusion: −1 is aroot repeated twice, and 1 is a root once.

So L is a 180◦ rotation around the axis of an eigenvector with eigenvalue 1, inthe plane of eigenvectors with eigenvalue −1.

In the second case we have eigenvalue 1, and after dividing the characteristicpolynomial by λ− 1 we have quotient of the form

λ2 + cλ+ 1.

We could have c = 2 in which case we have eigenvalue −1 repeated twice andwe are in the situation of the first case.

We could have c = −2 in which case we have eigenvalue 1 repeated twice moreand L is the identity transformation.

And finally, we might have two complex conjugate eigenvalues, each of norm 1,which must therefore have the form

λ1 = cos(θ) + i sin(θ) and λ2 = cos(θ)− i sin(θ)

for unique angle θ in the interval (−π, π].

Appealing to basic facts about real normal matrices (M commutes with its trans-pose) there is an orthogonal basis in which M has the form1 0 0

0 cos(θ) sin(θ)0 − sin(θ) cos(θ)

,

More specifically, there is a right-handed ordered orthonormal basis ~p1, ~p2, ~p3 sothat the orthogonal matrix of transition P whose columns are this basis (so P−1 isthe transpose PT ) satisfies

P−1MP =

1 0 00 cos(θ) sin(θ)0 − sin(θ) cos(θ)

This is a rotation in the plane of the second and third basis vectors, whose axis ofrotation is the first basis vector, an eigenvector for eigenvalue 1.

I should point out that we have adopted the point of view that M represents theeffect of the rotation on vectors, in particular the coordinate axes. The coordinatesof points in this new rotated basis will change in a way that is “opposite” to this:i.e. replace θ by −θ to get a matrix that produces coordinates in the new basis.This will be, in fact, M−1.

18 LARRY SUSANKA

If you begin by knowing the normalized axis of rotation ~a and angle θ it is fairlystraightforward to produce the matrix. Find any unit vector ~p2 perpendicular to~p1 = ~a and let ~p3 = ~p1 × ~p2.

If P has columns ~p1, ~p2, ~p3 then the matrix of the rotation is

M = P

1 0 00 cos(θ) sin(θ)0 − sin(θ) cos(θ)

P−1.

11. (*) Quaternions

In 1775 Euler proved the Euler rotation theorem, that any rigid motion ofspace that leaves at least one point fixed corresponds to a rotation about an axisthrough that point. So compositions of rotations, one after the other, that leavethis point fixed are also rotations. This is tantalizingly similar to the fact thatmultiplying two complex numbers of norm one produces another, and multiplicationby such a number corresponds to a rotation.

Complex numbers deal efficiently with rotations in the plane, so it was thoughtthat some similar operation on ordered triples could be helpful when trying to solvethe much more troublesome issue of rotations in space. The search was on for anoperation to make R3 into a associative division algebra, as complex multiplicationdid for R2.

In October, 1843 William Rowan Hamilton had an inspiration (reportedlywhile walking by a particular bridge in Dublin) about how this could be done, butusing 4 dimensions, not 3.

The quaternions are represented, usually, as the collection of symbols of theform

a+ b i+ c j + d k where a, b, c and d are real

and the symbols i, j and k satisfy the equations

i2 = j2 = k2 = −1 and ij = k and jk = i and ki = j

but otherwise expressions are added and multiplied formally, as polynomial expres-sions would be, with commutative addition and applications of the distributive lawof multiplication over addition. Associativity of this multiplication is assumed.

This set of symbols with these operations is usually denoted H, and the multi-plication is called the Hamilton product.

It was not understood that these properties were consistent, that some might notforbid others in the presence of associativity, and it is a bit of a job to prove themto be consistent, unlike the case of complex numbers where the matter is prettystraightforward.

The fact that the multiplication values assumed to be true above imply that

ij = −ji and jk = −kj and ki = −ik


and that, in general, quaternions do not commute using this Hamilton product,was regarded as troubling but necessary: after all, rotations are not commutative,as a minute spent rotating an apple can confirm.

The job of showing associativity of the Hamilton product (and consistency ofthe basic “multiplication table” whose entries are listed above) will be easy forus after we find a collection of matrices to represent our quaternions, as we didwith complex numbers. Matrix multiplication will correspond to the Hamiltonproduct, and we know that matrix multiplication is associative and satisfies thedistributive properties. Until then we will just assume the properties listed aboveto be consistent.

As we did with complex numbers, quaternions are identified with R4

a+ b i+ c j + d k ⇐⇒ (a, b, c, d).

Even more handy will be the representation as

a+ b i+ c j + d k ⇐⇒ a+ ~v where ~v = 〈b, c, d〉 = b~i + c~j + d~k.

Here the number a is called the real part of the quaternion, and ~v is called theimaginary part.

In the text that follows, the imaginary part of a quaternion is identified with

ordinary three dimensional space, and~i, ~j and ~k are the usual unit coordinatevectors in R3.

If a = 0 and ~v 6= 0 the quaternion is called imaginary, while if ~v = 0 thequaternion is called real.

Expressed this way, the “multiplication table entries” shown above lead to a

formula for the Hamilton product of quaternions q1 = a1 +~v1 = a1 +b1~i+c1~j+d1~k

on the left by q2 = a2 + ~v2 = a2 + b2~i + c2~j + d2~k on the right:

q1 q2 = (a1 + ~v1) (a2 + ~v2)

= (a1a2 − ~v1 · ~v2) + (a1~v2 + a2~v1 + ~v1 × ~v2)

where ~v1 · ~v2 represents the ordinary real dot product

~v1 · ~v2 = b1b2 + c1c2 + d1d2

and ~v1 × ~v2 denotes cross product

~v1 × ~v2 = (c1d2 − c2d1)~i + (d1b2 − b1d2)~j + (b1c2 − c1b2)~k.

It is now easy to see the source of the lack of commutativity in the Hamiltonproduct. If you switch the order of the two quaternions, all the terms in theproduct remain the same except the cross product term. And cross product isanti-commutative, i.e.

~v1 × ~v2 = −~v2 × ~v1.So the real part of the Hamilton product commutes, but the imaginary part does

not due to its cross product component. When and only when the imaginary partsare parallel (or one is zero) will the Hamilton product of two quaternions commute.

20 LARRY SUSANKA

For q ∈ H we will sometimes use an “arrow on top” notation such as q = qr + ~qior q = a+~b to indicate the real and imaginary parts of q, and our intent is that qr(or a) denote the real part of q and ~qi (or ~b) its imaginary part.

The conjugate of q ∈ H is denoted q∗ and defined to be qr − ~qi. This means q∗

commutes with q. Also, a calculation shows that

(p q)∗

= q∗ p∗.

If q = a+ b~i + c~j + d~k then

q q∗ = q∗q = a2 + b2 + c2 + d2

and ‖q‖ =√q q∗ = ‖q∗‖ is a norm on H, satisfying all the properties listed for the

complex norm in Section 6.

For instance ‖ p q ‖2 = p q (p q)∗ = p q q∗p∗ = p ‖q‖2 p∗ = ‖p‖2‖q‖2.

This means:

‖ p q ‖ = ‖p‖ ‖q‖.

Unless both real and imaginary parts of q are zero

q q∗

‖q‖2= q

q∗

‖q‖2= q∗

q

‖q‖2= 1.

So any nonzero q (and q∗ too) has a multiplicative inverse given explicitly as abovefor the Hamilton product.

As in the complex numbers, any nonzero member q of H can be written in aunique way as a unit quaternion multiplied by a positive scalar:

q = ‖q‖ q

‖q‖= ‖q‖Vq = ‖q‖

(qr‖q‖

+~qi‖q‖

)where the unit quaternion Vq is called the versor of q.

~qi/‖q‖ will not (usually) be a unit vector, but if it is nonzero this componentcan be written as

~qi‖q‖

=‖~qi‖‖q‖

~qi‖~qi‖

and we have

q = ‖q‖(qr‖q‖

+~qi‖q‖

)= ‖q‖

(qr‖q‖

+‖~qi‖‖q‖

~qi‖~qi‖

)where (

qr‖q‖

)2

+

(‖~qi‖‖q‖

)2

= 1.

Suppose q is not real. Then there is an angle θ in (0, π) and angle θ = 2π − θin (π, 2π) for which

q = ‖q‖ ( cos (θ) + sin (θ) U ) = ‖q‖(

cos(θ)

+ sin(θ)

(−U))


where U is the unit imaginary quaternion ~qi/‖q‖.

This assemblage of factors is called the polar form of q, but remember that θis not determined unless you specify ~qi/‖q‖ or −~qi/‖q‖ as the unit imaginary.

If q is real, a choice of θ = π or θ = 0 produces a polar form; in this case U canbe picked however you like because sin (θ) = 0.

With the properties assembled in this section (assuming, of course, associativityand consistency which will be dealt with later) we see that H is a 4 dimensionalassociative real division algebra. It is not commutative. It has an algebra norm, sowe can think of convergence and calculus involving quaternions.

12. (*) Solutions of z2 = −1

There are no solutions among real numbers to z2 = −1, a situation that led tothe original manipulations which we would call complex arithmetic.

If we search for solutions among complex numbers z = a+ b i, we see that

−1 = a2 − b2 + 2abi⇒ a = 0 and b2 = 1.

So b = ±1 and therefore z = ±i. There are exactly two square roots of −1 amongcomplex numbers.

Let’s now look for solutions to q2 = −1 among quaternions, and use the polarform q = ‖q‖ ( cos (θ) + sin (θ) U ). So if in fact q2 = −1 then

−1 =‖q‖2(

cos2(θ)− sin2(θ) U ·U + 2 cos(θ) sin(θ)U + U×U)

=‖q‖2(

cos2(θ)− sin2(θ) + 2 cos(θ) sin(θ)U).

Since cos(θ) sin(θ) must be 0 (the imaginary part must vanish) we find that infact cos(θ) = 0, sin(θ) = ±1 and ‖q‖ = 1.

So among quaternions solutions can only be found among the unit imaginaryquaternions: the vectors on the unit sphere S2 in R3. And vectors on this sphereare all, in fact, solutions.

Vectors on the unit imaginary sphere are all, and the only,square roots of −1 among the quaternions.

Examining a quaternion product

q1q2 = (a1 + ~v1) (a2 + ~v2)

= (a1a2 − ~v1 · ~v2) + (a1~v2 + a2~v1 + ~v1 × ~v2)

we see that if ~v1 is a multiple of ~v2 then the product quaternion has imaginary partthat is also a multiple of ~v2.

Let U be any unit imaginary. Introducing notation, we define the slice of Hcontaining U by

HU = { a+ bU | a, b real } = H(−U).

22 LARRY SUSANKA

HU is a two dimensional real vector space, and by the above calculation it isan algebra. The norm on the quaternions gives us a norm on each slice. Eachquaternion with nonzero imaginary part is in exactly one of these slices.

The obvious linear map sending complex number a + b i to quaternion a + bUis an algebra isomorphism. Also, the map sending a+ b i to quaternion a+ b (−U)is an algebra isomorphism. Since an algebra isomorphism must send i to a squareroot of −1, these are the only algebra isomorphisms between C and HU.

And the complex norm on C agrees with the quaternion norm on each HU.

H is the union of “copies” of C. These copies are the sets of quaternions HU,which we called slices above. The real quaternions are in every slice, but

those are the only quaternions in more than one slice.

13. (*) Special Products of Quaternions: Rotations in R3

We have decided to identify vectors in space with imaginary quaternions, andwish to use quaternions to study rotations in space, by analogy with what we didin two dimensions with complex numbers.

There, multiplying by a complex number of norm one did the job.

Exploring this idea, we multiply a quaternion q = a +~b by a generic nonzerovector ~v which gives

q ~v = −~b · ~v + a~v +~b× ~v.

If this is to be a vector for any ~v, as it must if this Hamilton product is to induce

a rotation of ~v, then ~b · ~v must equal 0 for any ~v, which forces ~b to be the zerovector. So the quaternion q is real. The same is true if we examine the product ~v q.

If this is to represent rotation we must have q = a = 1 so we have discovered theidentity “rotation.” (If q = a = −1 we have inversion, which is not a rotation inR3.)

So this attempt fails miserably.

But we shall persist, and examine product q ~v p involving a second quaternion p =

c+ ~d multiplied on the right and presume, to avoid repeating the last calculation,

that neither ~b nor ~d are zero.

Recall the scalar triple product vector identity, for any vectors ~w, ~u and ~v:

~w · (~u× ~v) = ~v · (~w × ~u) = ~u · (~v × ~w) .

and also the triple cross product identity

~w × (~u× ~v) = (~w · ~v) ~u− (~w · ~u) ~v.


With this in mind, we calculate

q ~v p =(−~b · ~v + a~v +~b× ~v

)p

=− c~b · ~v −(a~v +~b× ~v

)· ~d+ c

(a~v +~b× ~v

)−(~b · ~v

)~d+

(a~v +~b× ~v

)× ~d

=(−c~b− a ~d+~b× ~d

)· ~v

+(a c+~b · ~d

)~v −

(~b · ~v

)~d−

(~d · ~v

)~b+

(c~b− a ~d

)× ~v.

As above, if multiplication by q and p in this way is to correspond to rotationof generic ~v, it must always produce a vector, and so must have zero real part forany ~v. (

−c~b− a ~d+~b× ~d)· ~v = 0.

Since ~b× ~d is orthogonal to both ~d and ~b the only way this can happen for all ~v

is if ~d = k~b for some constant k, and substitution then yields c = −k a.

In other words, p = −k(a−~b

)= −k q∗ for some real constant k.

So we are working with a product of the form

−k q ~v q∗ = −k ‖q‖2 r ~v r∗

where r is the unit quaternion q/‖q‖.

If the norm of ~v is not changed by this multiplication, as it would not be witha rotation, then −k ‖q‖2 = ±1. In other words, we need only consider products asabove where q is a unit quaternion. But with that assumption, if −k = −1 andwe determine somehow that q ~v q∗ does represent a rotation, then −1 q ~v q∗ wouldrepresent a rotation followed by inversion, and in three dimensions inversion isnot a rotation.

So we will assume that our candidate “rotation” quaternion

q = cos (θ) + sin (θ) U for θ in (0, π)

is a unit quaternion in polar form (so U is a unit imaginary and sin(θ) ≥ 0) andq ~v p = q ~v q∗, in which case the product becomes

q ~v q∗ =(a2 −~b ·~b

)~v + 2

(~b · ~v

)~b+ 2 a~b× ~v

=(

cos2(θ)− sin2(θ))~v +

(2 sin2(θ) U · ~v

)U + 2 cos(θ) sin(θ) U× ~v

= cos(2 θ)~v + ( 1− cos(2 θ)) (U · ~v ) U + sin(2 θ) U× ~v

We will see that this represents a rotation by angle 2 θ counterclockwise aroundaxis U.

Suppose U,A and B form an orthonormal ordered basis of R3, with U×A = Band A×B = U and B×U = A. So the basis is a “right-handed” basis, as is the

standard basis~i, ~j and ~k of R3.

Writing ~v in terms of this new basis, we have

~v = xU + yA + zB

24 LARRY SUSANKA

and then

q ~v q∗ = cos(2 θ) (xU + yA + zB)

+ ( 1− cos(2 θ)) xU + sin(2 θ) U× (xU + yA + zB)

= cos(2 θ) yA + cos(2 θ) zB + xU + sin(2 θ) U× (yA) + sin(2 θ) U× (zB)

= xU + cos(2 θ) yA + cos(2 θ) zB + sin(2 θ) yB + sin(2 θ) (−zA)

= xU + (cos(2 θ) y − sin(2 θ) z) A + (sin(2 θ) y + cos(2 θ) z) B

Expressed in terms of matrices, the vector ~v has coordinates (x, y, z) in basisU,A,B. So the expression in the last line above is1 0 0

0 cos(2 θ) − sin(2 θ)0 sin(2 θ) cos(2 θ)

xyz

in this basis, which makes it apparent that this represents rotation by angle 2 θ inthe AB plane with axis U, where 0 < 2 θ < 2π.

If P is the matrix with columns U,A,B (in that order) then the matrix of thisrotation is

M = P

1 0 00 cos(2 θ) − sin(2 θ)0 sin(2 θ) cos(2 θ)

P−1.

Of course this does not depend on choice of A. Any other choice K perpendicularto U would generate a matrix of transition from the U,K,U × K basis to theU,A,B basis of the form1 0 0

0 c −s0 s c

where c2 + s2 = 1 and we find that

1 0 00 c s0 −s c


1 0 00 c −s0 s c

=


and the same orthogonal matrix M would be generated.

Note also that there are two different unit quaternions, identified with pointson the unit sphere S3 in R4, which can be used to produce each rotation. Both qand −q induce the same rotation, one corresponding to a counterclockwise rotationaround an axis vector U, the other a counterclockwise rotation around axis vector−U. The sum of the two angles of rotation is 2π. The “double cover” of therotations by the unit quaternions is not just a mathematical quirk: it has physicalsignificance, as will be pointed out in Section 25.

To reiterate, we know that each orthogonal matrix with determinant 1 is asso-ciated with exactly one rotation, which can be identified by an axis and rotationangle. On the other hand, each unit quaternion corresponds to a rotation andtherefore exactly one rotation matrix. And knowledge of an axis of rotation androtation angle can be used to generate two quaternions, each of which can be usedto implement the rotation and, if you wish, create the associated orthogonal matrix.


Finally, we examine another way to produce the associated matrix from axisvector and angle by taking another look at

q ~v q∗ = cos(2 θ)~v + ( 1− cos(2 θ)) (U · ~v ) U + sin(2 θ) U× ~v.

If generic vector ~v in R3 has coordinates (v1, v2, v3) and axis vector U has coor-dinates (U1, U2, U3), both written as columns, then a direct calculation shows thatthe above equation for the rotated vector M~v equals

M~v =(

cos(2 θ) I + (1− cos(2 θ)) UUT + sin(2 θ) U×)~v

where I is the 3× 3 identity matrix, UUT is a symmetric 3× 3 matrix and U× isthe 3× 3 matrix that implements cross product with U (on the left):

UUT =

(U1)2

U1U2 U1U3

U2U1 (U2)2

U2U3

U3U1 U3U2 (U3)2

U× =

0 −U3 U2

U3 0 −U1

−U2 U1 0

.

This provides a direct and somewhat faster way (without the choices involvedin finding the matrix of transition P) to create a matrix for the rotation aroundnormalized axis U by angle 2 θ. It is called Rodrigues’ formula.

More importantly, if you actually have a rotation matrix M in hand, Rodrigues’formula shows that

M−MT = 2 sin(2 θ) U× = 2 sin(2 θ)

0 −U3 U2

U3 0 −U1

−U2 U1 0

so by normalizing you can determine the coordinates U1, U2 and U3 of the unitrotation axis and the rotation angle 2 θ.

14. Power Series and Derivatives for Quaternions

In the Section 12 we found that quaternions can be expressed as

q = a+ bU

where a and b are real and U is imaginary with U2 = −1.

If q is not real and we choose U so that b > 0 the representation is unique, butallowing negative coefficients gives the only two different representations of each qin terms of unit imaginary quaternions:

q = a+ bU = a+ (−b) (−U) .

This means that if

f(x) =

∞∑n=0

anxn

is any real power series with radius of convergence R and if q is a quaternion and‖q‖ < R then the power series f(q) can be defined by

f(q) =

∞∑n=0

anqn

26 LARRY SUSANKA

exactly in the way that f was extended to a complex argument. The problemhere is that the lack of commutativity of H severely restricts what one can do withfunctions defined in this way.

For instance, suppose we define, for any quaternion q the exponential function

eq =

∞∑n=0

qn

n!.

If p is another quaternion and p = c+ dV then unless the imaginary part of one isa multiple of the imaginary part of the other p q 6= q p so we cannot say how eqep

might be related to eq+p.

It is true that restricted to the slice HU of H defined by unit imaginary quater-nion U the function f on that slice satisfies all the usual properties. For instanceeqep = eq+p for quaternions p and q both in the slice HU. Recall also that thereal quaternions are in every slice (though that is the only overlap between unequalslices) so if q = a+ bU for real a and b and imaginary U then

ea+bU = eaebU and(ea+bU

)∗= ea−bU.

For any function f defined on quaternion q by a real power series, if q is in HU

then so is f(q). This of course is not the case for more general quaternion-valuedfunctions.

Uniqueness of a real power series representation follows as in the complex case.Derivatives and integrals restricted to this particular slice exist, and can obtainedby term-by-term operations.

For nonzero quaternion q with

q = ‖q‖ ( cos (θ) + sin (θ) U )

for unit imaginary U and real h and θ in (h, h+2π) we can definea branch of the log function on q by

log(q) = log(‖q‖) + θU.

The angle θ can be denoted arg(q), called the argument of q. As in the complexcase there is ambiguity here.

arg(q) depends on both h and U, or rather h and choice of either U or −U.This is because there is a second angle θ in (h, h+ 2π)

cos (θ) + sin (θ) U = cos(θ)

+ sin(θ)

(−U)

so there are potentially two arguments for those nonzero q in HU which do not liealong the ray at angle h. The two possibilities, θ and θ, add to a multiple of 2π.We must specify which of these arguments we intend, and that means specifying−U or U as well as h. Each choice leads to a different branch of the logarithm.

Defining as above the exponential function by power series on the slice HU wehave for q = ‖q‖ ( cos (θ) + sin (θ) U )

elog(q) = eln(‖q‖)+θU = elog(‖q‖)eθU = ‖q‖ eθU = ‖q‖ ( cos (θ) + sin (θ) U ) = q.


Using the other branch of the logarithm defined with the same h but unit imag-inary quaternion −U we have also

elog(q) = eln(‖q‖)+θ(−U) = ‖q‖ eθ(−U) = ‖q‖(

cos(θ)− sin

(θ)U)

= q.

Finally, we will be particularly interested in unit quaternions, and we point outthe obvious here: those can all be represented as

q = e θU = cos (θ) + sin (θ) U where U2 = −1, and ‖q‖ = 1.

With exponentials and logs in hand we can define other functions using them.For instance we can define qp for q in the domain of a branch of the logarithm, by

qp = e p log(q) p, q both in some HU.

A further word about derivatives is in order. We mentioned that restricted toeach slice quaternion functions are no different than any other representation ofcomplex-valued functions of a complex variable.

Attempting to define a derivative not restricted to a slice, as is done for complexderivatives

f ′(q) = limh→0

f(q + h)− f(q)

h(h in H)

fails from the outset due to the fact that left division by h need not equal rightdivision by h so the expression is ambiguous.

So let’s agree to worry about that later and settle for now on right division byh and a function as simple as f(q) = q2.

((q + h)2 − q2

)h−1 =

(q2 + q h+ h q + h2 − q2

)h−1

= q + h q h−1 + h = q +h

‖h‖qh∗

‖h‖+ h.

h/‖h‖ is a unit quaternion we will find (later) that the product

h

‖h‖qh∗

‖h‖will rotate the imaginary part of q and the angle and axis of this rotation can bevaried as the norm of h approaches 0.

In other words, the limit (and so the derivative) cannot exist unless q is real orh is confined to the same slice as q. The burden of satisfying the differentiabilitycondition is so onerous that virtually no function can bear it.

So we will restrict our consideration of derivatives to functions f and quaternionsq in HU for which f(q + h) is in HU whenever h has sufficiently small norm and isitself in HU. (Functions defined using real power series do satisfy this condition.)In this case h−1 will commute with f(q+h)− f(q), so the left-right division choiceis not an issue, and we define

d f

dU(q) = lim

h→0

f(q + h)− f(q)

h(h in HU)

whenever this limit exists.

28 LARRY SUSANKA

As mentioned earlier, the existence of this limit is exactly analogous to theexistence of a complex derivative (since HU is algebra isomorphic to C) and theexistence of a complex derivative is a very strong condition. All the usual powerfultheorems about complex differentiable functions apply on each slice.

It is possible to create in some cases the idea of directional derivative forquaternion valued functions of a quaternion variable, a weaker notion of differen-tiability but sufficient to give us some useful information about, for instance, thelocation of potential minima or other analogues from ordinary calculus.

The Gateaux derivative of f at q in the direction of nonzero p is defined as

Dpf(q) = limt→0

f(q + tp)− f(q)

t(t real)

when the limit exists.

If q and nonzero p are both in HU and if d fdU does exist at q, the Gateaux

derivative Dp(q) will exist and

d f

dU(q) =

Dpf(q)

p.

Finally, the derivative of a path in the quaternions (a quaternion valuedfunction given for real numbers in an interval) is defined as would be done for anyfunction into R4.

Specifically, if f is defined on the interval (a, b) and has value

f(c) = f0(c) + f1(c) i+ f2(c) j + f3(c) k

for each c is in the interval we define dfdt (c) to be f ′0(c) + f ′1(c) i+ f ′2(c) j + f ′3(c) k

whenever all four ordinary component derivatives exist.

Both this derivative and the Gateaux derivative are real linear, and satisfy theproduct rule with Hamilton product when derivatives of each factor exist.

15. Quaternions as Matrices

Consider the four matrices:

E =

(1 00 1

)and A =

(0 −11 0

)and B =

(1 00 −1

)and C =

(0 11 0

).

The reader should check that

A2 = −E and B2 = C2 = E

while AB = −BA = C and CB = −BC = A and CA = −AC = B.

Using this “multiplication table” you can avoid matrix multiplications as youwork with these examples.

Recall that we saw that the collection of matrices of the form

xE + yA for real x and y

is algebra isomorphic to (i.e., “is,” or “is a representation of”) the complex numbers.


Commonly, one sees the quaternions represented as the set of all matrices whichcan be written as a combination, with real w, x, y and z,

wE + x (−iB) + y (−A) + z (−iC)

=w

(1 00 1

)+ x

(−i 00 i

)+ y

(0 1−1 0

)+ z

(0 −i−i 0

)=

(w 00 w

)+

(−ix y − iz−y − iz ix

)=

(w − ix y − iz−y − iz w + ix

).

The order −iB ⇒ −A ⇒ −iC is not required here. A variety of permutationswith appropriate sign changes would serve just as well.

If you multiply two sums of this kind together, you get another sum of this kind,and the same is true, more obviously, if you add two such sums, or multiply a sumlike this by a scalar. So these matrices constitute a 4 dimensional real algebra.Associativity is automatic: matrix multiplication is associative.

Satisfy yourself that the function that sends a generic quaternion w+x i+y j+z kto the matrix shown above is an algebra isomorphism. The squared norm of thequaternion is the determinant of this matrix, just as with the matrix representationof complex numbers. The conjugate of the quaternion is the conjugate transposeof the matrix of that quaternion.

If you want to represent the quaternions in terms of real matrices only, simplysubstitute in these matrices the block matrices(

1 00 1

)in place of 1, and

(0 00 0

)in place of 0, and

(0 −11 0

)in place of i.

This gives one representation of quaternion q = w + x i+ y j + z k as

w + x i+ y j + z k ⇐⇒

w x y z−x w −z y

−y z w −x−z −y x w

.

However, perhaps the nicest representation of quaternions in terms of real ma-trices is in terms of the 2× 2 block matrices

w

(E 00 E

)+x

(A 00 A

)+y

(0 −BB 0

)+z

(0 −CC 0

)=

w −x −y −zx w −z y

y z w −xz −y x w

The conjugate quaternion q∗ corresponds to the transpose of this matrix, and a

calculation shows the determinant is(w2 + x2 + y2 + z2

)2= ‖q‖4.

If you identify generic quaternion a + b i + c j + d k with the member (a, b, c, d)of R4 written as a column, then left multiplication by quaternion w+x i+ y j+ z k

30 LARRY SUSANKA

corresponds to a linear transformation on R4 and hence can be represented by leftmultiplication by a matrix. The matrix we just presented is the one.

(w + x i+ y j + z k) (a+ b i+ c j + d k)

⇐⇒

aw − xb− yc− zdxa+ wb− zc+ ydya+ zb+ wc− xdza− yb+ xc+ wd

=



abcd

⇐⇒



a −b −c −db a −d c

c d a −bd −c b a

.

Of course right multiplication is also linear, but written on the left as matrixproduct we have the slightly different:

(w + x i+ y j + z k) (a+ b i+ c j + d k)⇐⇒

a −b −c −db a d −c

c −d a bd c −b a

wxyz

.

16. The Pauli Spin Matrices

In a bit of a side note to our main purpose, we define the matrices

σ0 = E =

(1 00 1

)and σ1 = C =

(0 11 0

)and σ2 = iA =

(0 −ii 0

)and σ3 = B =

(1 00 −1

)so that

σ1σ2 = iσ3 and σ2σ3 = iσ1 and σ3σ1 = iσ2.

and also

σ21 = σ2

2 = σ23 = σ0 = E with σkσm = −σmσk for nonzero unequal m and k.

The matrices σ1, σ2 and σ3 are called Pauli spin matrices. They are her-mitian, which means they are their own conjugate transpose, and so correspondto “observables” in quantum mechanics. They represent spin with respect to thecoordinate axes in quantum mechanical descriptions of certain particles. Any 2 by2 hermitian matrix can be found (in one way) as a combination

w σ0 + xσ1 + y σ2 + z σ3 =

(w + z x− iyx+ iy w − z

)with real w, x, y and z.


These matrices can be used to define a representation of one of the smallerClifford (also called geometric) algebras, a structure which we will resist exploringin these notes. We note in this regard, however, that the smallest real algebracontaining σ1, σ2 and σ3 is the algebra of all 2× 2 matrices with complex entries.

The 16 matrices of the form ±σk or ± i σk for k = 0, 1, 2 or 3 form a multiplicativesubgroup of this algebra, the Pauli spin group.

One easily shows that

w σ0 + x (−iσ1) + y (−iσ2)+z (−iσ3)

= w

(1 00 1

)+ x

(0 −i−i 0

)+ y

(0 −11 0

)+ z

(−i 00 i

)=

(w − zi −xi− y−xi+ y w + zi

)with real w, x, y and z provides yet another way of realizing the quaternions, thistime in terms of the Pauli spin matrices.

The Quaternion group is a subgroup of the Pauli spin group, and consists ofthe 8 elements ±σ0,±iσ1,±iσ2 and ±iσ3.

Here too the conjugate transpose matrix corresponds to the conjugate quater-nion, and the norm squared is the matrix determinant.

As in the last section, writing in terms of 4 by 4 real matrices, the representationof quaternion q = w + x i+ y j + z k is

w

1 0 0 00 1 0 0

0 0 1 00 0 0 1

+ x

0 0 0 10 0 −1 0

0 1 0 0−1 0 0 0

+ y

0 0 −1 00 0 0 −1

1 0 0 00 1 0 0

+ z

0 1 0 0−1 0 0 0

0 0 0 −10 0 1 0

yielding

w z −y x−z w −x −y

y x w −z−x y z w

=

w 0 0 00 w 0 0

0 0 w 00 0 0 w

+

0 z −y x−z 0 −x −y

y x 0 −z−x y z 0

.

The conjugate operation corresponds to transpose, and a calculation shows thedeterminant is the fourth power of the norm as before.

32 LARRY SUSANKA

17. Rotations in R4

A linear rigid motion L on R4 is a linear map that sends any (and hence every)right-handed ordered orthonormal basis to another. It sends the standard basis

~e1 = 〈1, 0, 0, 0〉 , ~e2 = 〈0, 1, 0, 0〉 , ~e3 = 〈0, 0, 1, 0〉 , ~e4 = 〈0, 0, 0, 1〉 ,to an orthonormal basis ~u1, ~u2, ~u3, ~u4 and the matrix M = (~u1 ~u2 ~u3 ~u4), which isthe matrix of this linear transformation, must have determinant 1.

L(v) = Mv for every vector v in R4.

Since M is real any complex roots of the characteristic polynomial come inconjugate pairs. Since M is an isometry, the norm of each root is 1.

If the characteristic polynomial of M has four real roots, it could have all fourequal to 1, exactly two equal to −1 or all four −1. It is also possible that therecould be two real roots and a conjugate pair of complex roots, or there could betwo sets of conjugate pairs of roots.

Because M is normal there is a right-handed ordered orthonormal basis

~p1, ~p2, ~p3, ~p4

so that the orthogonal matrix of transition P, whose columns are this basis, satisfies

Dµ,θ = P−1MP =

cos(µ) − sin(µ) 0 0sin(µ) cos(µ) 0 0

0 0 cos(θ) − sin(θ)0 0 sin(θ) cos(θ)

for certain θ and µ in (−π, π].

In the first case, root 1 repeated four times, M is the identity matrix and µ =θ = 0. In the second case we can choose µ = 0 and θ = π. In the third case bothµ and θ are π and M represents inversion. Unlike 3 dimensions, but similar to 2dimensions, inversion is rigid in R4.

If there is a single pair of complex roots, the angle µ could be 0 or π in whichcase M acts as the identity or inversion in the plane of ~p1 and ~p2, and rotation byangle θ in the plane of ~p3 and ~p4.

In the final case, M acts as rotation by angle µ in the plane of ~p1 and ~p2, andby angle θ in the plane of ~p3 and ~p4.

18. Quaternions and Isometries on R4

Suppose α and β are quaternions, conflated with members of R4 as

α = w + x i+ y j + z k ⇔

wxyz

whenever that is convenient.


We define the linear function F on β in H by

Fα(β) = −αβ α where α is a fixed unit quaternion.

The quaternions α, i α, j α and k α form an orthonormal basis of H, so the lastthree vectors span the three dimensional subspace α⊥ of H.

A calculation verifies that

Fα(α) = −α, Fα(i α) = i α, Fα(j α) = j α and Fα(k α) = k α.

So Fα is a reflection in the hyperplane perpendicular to α.

We discovered in Section 5 that any linear isometry in R4 can be realized as thecomposition of no more than four reflections, and nontrivial linear rigid motions(isometries with determinant 1) are the composition of either two or four reflections.

That means linear rigid motions applied to β are of the form

(α2 α1) β (α1 α2) or (α4 α3 α2 α1) β (α1 α2 α3 α4)

while other nontrivial linear isometries have the form

−α1 β α1 or − (α3 α2 α1) β (α1 α2 α3)

for various unit quaternions αi.

For us, the important point to take away from this calculation is:

Every linear rigid motion of R4 can be calculated on quaternion β, thought ofas a member of R4, as the Hamilton product αβ γ for certain unit quaternionsα and γ.

Every orthogonal matrix with determinant 1 can be construed as a linearrigid motion, so left multiplication by such a matrix can be calculated usingquaternions in this way.

Representing quaternions as columns in R4, we saw at the end of Section 15 thatif three generic quaternions (not necessarily unit quaternions)

α⇔

wxyz

, β ⇔

lmno

, γ ⇔

abcd

are multiplied, then

(αβ)γ ⇔

a −b −c −db a d −cc −d a bd c −b a

w −x −y −zx w −z yy z w −xz −y x w

lmno

.

Associated in the other order we have

α(βγ)⇔


a −b −c −db a d −cc −d a bd c −b a

lmno

so matrix pairs of this particular form commute.

34 LARRY SUSANKA

Let Left denote the set of matrices which can be produced by left multiplicationby a quaternion, as above, and Right denote those matrices which can be formedto represent right multiplication.

Inspection shows that the only matrices in both Left and Right are scalarmultiples of the identity matrix.

If the matrices corresponding to left and right multiplication by quaternion αare denoted Lα and Rα, respectively, and if for all β

α1βγ1 = α2βγ2

then αβγ = β for all β where α = α−12 α1 and γ = γ1γ−12 . It follows that the

product LαRγ is the identity matrix. But then L−1α = Lα−1 = Rγ and this impliesthat Lα and Rγ are in both Right and Left: in other words, they are multiplesof the identity matrix.

Our conclusion: there is a nonzero real constant k so that α1 = k α2 and γ1 =k−1 γ2. So any (nonzero) matrix that can be represented as a product LαRγ canbe so represented as L(k α)R(k−1 γ) for nonzero real k and in no other way.

Some observations: (i) The determinants of the matrices Lα and Rα are both‖α‖4. (ii) Unless α = 0 the four columns of each matrix form orthogonal sets ofvectors.

So if α and γ are unit quaternions, the matrices Lα and Rγ and LαRγ =RγLα are matrices of rotations of R4 and the two quaternion products αβγ and(−α)β(−γ) can be thought of as implementing this product rotation on β as amember of R4. There are no other unit quaternion pairs that could.

And, as we saw above, any orthogonal matrix with determinant 1 can be repre-sented in this way.

We will now examine some consequences of this fact.

Suppose P is any orthogonal matrix with determinant 1 and L ∈ Left. ThenP = LαRγ for certain α and γ so

P−1 L P = Rγ−1 Lα−1 LLαRγ = Lα−1 Rγ−1 Rγ LLα = Lα−1 LLα

and since Left is closed under multiplication this last is itself a member of Left.The same is true for Right: both Left and Right are closed under change ofbasis to any other orthonormal right-handed basis.

Now suppose M = LR with L ∈ Left and R ∈ Right.

Then we have

P−1 M P = P−1 L R P = P−1 L P P−1 R P

= Rγ−1 Lα−1 LLαRγ Rγ−1 Lα−1 RLαRγ = (Lα−1 LLα)(Rγ−1 RRγ

).

The second line is interesting, but the point of this lies in the first.

Every orthogonal matrix M with determinant 1 can be reduced to form Dµ,θ

as defined in Section 17 using orthogonal matrix of transition P, as P−1 M P.The first line shows how to find the quaternions for L and R if we can find thequaternions that produce Dµ,θ .


The matrices that corresponds to left multiplication by quaternion cos(ρ) +sin(ρ) i = C + S i and right multiplication by quaternion cos(τ) + sin(τ) i = c+ s iare, respectively,

L(ρ) =

C −S 0 0S C 0 00 0 C −S0 0 S C

and R(τ) =

c −s 0 0s c 0 00 0 c s0 0 −s c

.

Their product isC −S 0 0S C 0 00 0 C −S0 0 S C

c −s 0 0s c 0 00 0 c s0 0 −s c

=

Cc− Ss −Cs− Sc 0 0Sc+ Cs Cc− Ss 0 0

0 0 Cc+ Ss Cs− Sc0 0 Sc− Cs Ck + Ss

=

cos(ρ+ τ) − sin(ρ+ τ) 0 0sin(ρ+ τ) cos(ρ+ τ) 0 0

0 0 cos(ρ− τ) − sin(ρ− τ)0 0 sin(ρ− τ) cos(ρ− τ)

One can make this equal to

Dµ,θ = P−1MP =

cos(µ) − sin(µ) 0 0sin(µ) cos(µ) 0 0

0 0 cos(θ) − sin(θ)0 0 sin(θ) cos(θ)

by solving the equations

ρ+ τ = µ and ρ− τ = θ.

Then

M = P Dµ,θ P−1 = PL(ρ)R(τ) P−1 =(PL(ρ) P−1

) (PR(τ) P−1

)and we read off the two quaternions we are looking for as the leftmost columns of

L = PL(ρ) P−1 and R = PR(τ) P−1.

Recall from above that the change of basis can actually be implemented as

L = Lα L(ρ)Lα−1 and R = Rγ R(τ)Rγ−1

for certain unit quaternions α = e xU and γ = e yV at angles x and y and unitimaginaries U and V.

So if β is any quaternion with associated vector B in R4 as

β = l +mi+ n j + o k ⇐⇒ B =

lmn0

36 LARRY SUSANKA

then we can rotate B, finding the rotated vector MB, by doing the quaternioncalculation (

e xU e ρ i e−xU)β(e−yV e τ i e yV

).

It may be obvious that the left factor remains at angle ρ (for a different unitimaginary quaternion) and the right factor remains at angle τ , but we would liketo confirm this.

Suppose W is any unit imaginary quaternion. (In the case we care about W = i.)To ease the clutter we let a = cos(x), b = sin(x), c = cos(ρ) and s = sin(ρ).

(a+ bU) (c+ sW) (a− bU)

= (a+ bU) [ (ac+ bsW ·U) + (−bcU + asW − bsW ×U) ]

= a2c+ absW ·U + b2cU ·U− absU ·W + b2sU · (W ×U)

+ abcU + b2s (W ·U) U− abcU + a2sW − absW ×U

− b2cU×U + absU×W − b2sU× (W ×U)

= c+ s[b2 (W ·U) U + a2 W + 2abU×W + b2 U× (U×W)

]= cos(ρ) + sin(ρ)

[2b2 (W ·U) U +

(a2 − b2

)W + 2abU×W

]where the last line uses the triple cross product identity. The imaginary part onthe right in that line must be a unit vector, and it is seen to be so by finding itsnorm, remembering that

(W ·U)2

+ ‖W ×U‖2 = 1 and a2 + b2 = 1.

So we have verified the semi-obvious: the angles of the two rotation quaternionsare the angles we calculated and used in L(ρ) and R(τ) to create the block-diagonalDµ,θ, with ρ+ τ = µ and ρ− τ = θ. And there are unit imaginary quaternions Aand B so that

e ρA β e τ B

implements the rotation on the quaternion β.

19. (*) Another Look at Rotations in R3

We have assembled a variety of representations for unit quaternion q, such as

q = w + x i+ y j + z k = cos(θ) + sin(θ) U = e θU,

where U is a unit imaginary and θ is the corresponding argument.

Let’s examine the matrix for a rotation in R4 corresponding to the linear function,created from unit quaternion q, given by the formula

F(p) = q p q∗ = e θU p e−θU for every quaternion p in H.

This is the situation that, if p is restricted to the imaginary quaternions, corre-sponds to a rotation in R3 by angle 2 θ around axis given by U.

In R4 the corresponding function for unit q is

L(v) = M v = Lq Rq∗ v for every vector v in R4.


M is the product matrix Lq Rq∗ which is

LqRq∗ =


w x y z−x w z −y−y −z w x−z y −x w

=

1 0 0 0

0 1−2y2−2z2 2xy−2wz 2xz+2wy

0 2xy+2wz 1−2x2−2z2 2yz−2wx

0 2xz−2wy 2yz+2wx 1−2x2−2y2

.

The block matrix1− 2y2 − 2z2 2xy − 2wz 2xz + 2wy2xy + 2wz 1− 2x2 − 2z2 2yz − 2wx2xz − 2wy 2yz + 2wx 1− 2x2 − 2y2

is the rotation matrix in R3 which corresponds to this rotation restricted to theimaginary quaternions. It can be calculated from the 9 products

x2, y2, z2, wx, wy, wz, xy, xz and yz

and is an alternative to the formula for this matrix from Section 13.

20. (*) Recap: Isometries in Rn, for n = 1, 2, 3 and 4

There are two isometries that fix the origin in R. The first is the identity,the second is the function f(x) = −x. The units of R consist of just the zero-dimensional “unit sphere” S0 = { 1, −1 }, and there is a unique isometry definedby left multiplication by each of these two units.

R2 can be identified with (or is) a normed algebra C. The units of C correspondto the unit circle S1 in R2. Each origin-fixing isometry of R2 can be identified withthe function given by fc(x) = cx for a unique c in S1.

Let’s consider the unit sphere S3 in R4 and identify R4 with H and also identifyR3 with the subspace of imaginary quaternions in H.

If T is any origin-fixing isometry on R3 and if the determinant of T is 1 thenthere is a member q of S3 which implements the isometry via

T (x) = q x q∗.

If q produces the isometry T in this way, so does −q but that is the only duplication.

On the other hand, if T is any linear isometry on R3 with determinant −1 thenthere is a member q of S3 which implements the isometry via

T (x) = q x∗ q∗.

As before, if q produces the isometry T in this way, so does −q but that is the onlyduplication.

Finally, we consider the case of R4.

If T is any linear isometry on R4 then there are members q and p of S3 so that

T (x) = q x p

38 LARRY SUSANKA

if determinant of T is 1, andT (x) = q x∗ p

if determinant of T is −1.

If q and p produce the isometry T in this way, so does the pair −q and −p, butthis is the only duplication.

21. Octonions

We will take a brief detour to review a few properties of the remaining finitedimensional normed division algebra, denoted by O and usually called now theoctonions. They were invented by John Graves in late December, 1843, followingan exchange of letters with his friend Hamilton in which Hamilton outlined his newlyfound quaternions.

It is an eight dimensional real algebra, and can be defined using a procedurecalled the Cayley-Dickson construction, which can also be used to define C andH, as follows.

We start with R and define a product on R× R by

(a, b)(x, y) = (ax− yb, ya+ bx) for pairs of real numbers (a, b) and (x, y).

This is, of course, nothing more than complex multiplication. The conjugationoperation applied to (a, b) is (a,−b) and the complex norm corresponds to thenormal Euclidean norm on R× R.

We take the next step and define a product on C× C by

(a, b)(x, y) = (ax− yb, ya+ bx)

for pairs of complex numbers (a, b) and (x, y).

A calculation shows that the complex number pairs (1, 0), (i, 0), (0, 1) and (0, i)span this four dimensional real vector space and can be identified with the unitquaternions 1, i, j and k, respectively, to produce an algebra isomorphism onto thequaternions.

When q = q0(1, 0) + q1 (i, 0) + q2 (0, 1) + q3 (0, i) for real q0, q1, q2 and q3, theconjugate quaternion corresponds to q∗ = q0(1, 0)−q1 (i, 0)−q2 (0, 1)−q3 (0, i) andthe quaternion norm is then, of course, the square root of q q∗ and the multiplicativeinverse of nonzero q is q∗/ (q q∗).

The octonions structure is produced by taking this procedure one more step.

Define a product on O = H×H by

(a, b)(x, y) = (ax− y∗b, ya+ b x∗)

for two pairs of quaternions (a, b) and (x, y). It is straightforward to check thatthe distributive properties hold and that the necessary properties involving realscalars hold too so this multiplication makes O into an algebra.

H×H is an eight dimensional real vector space, and the eight pairs

O1 = (1, 0), O2 = (i, 0), O3 = (j, 0), O4 = (k, 0),

O5 = (0, 1), O6 = (0, i), O7 = (0, j), O8 = (0, k)


span that space. O1 is the multiplicative identity. For each pair m and n of distinctintegers withm and n larger than 1 and not exceeding 8 we find OnOm = −OmOn 6=0 and also O2

n = −1.

The octonion multiplication is not associative. For instance

(O3O4)O5 = [ (j, 0) (k, 0) ] (0, 1) = (i, 0) (0, 1) = (0, i) while

O3 (O4O5) = (j, 0) [ (k, 0) (0, 1) ] = (j, 0) (0, k) = (0, kj) = (0,−i)though it is a fact that associativity does hold when restricted to a subspace gen-erated by any two octonions.

Failure of associativity means there is no way to represent the octonion structureas a collection of real matrices with matrix multiplication.

For octonion q given by

q = q1O1 + q2O2 + q3O3 + q4O4 + q5O5 + q6O6 + q7O7 + q8O8

define the octonion conjugate by

q = q1O1 − q2O2 − q3O3 − q4O4 − q5O5 − q6O6 − q7O7 − q8O8.

So if q = (a, b) then q = (a∗,−b) and q q = q q = q21 + q22 + · · ·+ q28 correspondsto the square of the usual Euclidean norm on R8, which we take to be the normon the octonions.

After considerable fussing with terms, one verifies that for two octonions p andq the equality

‖ p q ‖ = ‖ p ‖ ‖ q ‖holds: that is, this norm is an algebra norm on the octonions.

We can define the octonion inverse for octonion q to be q/ (q q) = q/‖ q ‖2 justas with the quaternions and complex numbers and we see that the octonions forma (non-associative) normed division algebra.

The Cayley-Dickson process can be continued to dimension sixteen and beyond.But the product at the next step (sometimes called the sedenions) has zero divisorsand so cannot have an algebra norm and cannot be a division algebra.

22. The Hopf Fibration

This topic has geometrical interest with application to particle physics, but it isnot a core idea for us. Still it is a nice idea and illustrates how quaternions help usthink about calculations.

We will show that in a certain explicit sense the three dimensional sphere S3 canbe thought of as S2 copies of the circle S1 (one circle for each point of the ordinarysphere) glued together.

The structure we consider here is called a Hopf fibration, and it is actuallyrealized by a function H called a Hopf map from S3 to S2 given by the formula

H(w, x, y, z) =(w2 + x2 − y2 − z2, 2(wz + xy), 2(xz − wy)

).

40 LARRY SUSANKA

The norm of the output triple is w2 +x2 + y2 + z2 so if (w, x, y, z) is on the unitsphere in R4 the output is on the unit sphere in R3, a fact which can be calculateddirectly or by considering the following.

If we identify a point (w, x, y, z) of R4 with quaternion q = w + x i + y j + z kand ordered triples in R3 with the imaginary quaternions we find that

q i q∗ =(w2 + x2 − y2 − z2

)i+ (2xy + 2wz) j + (2xz − 2wy) k

so this Hopf map is actually the result of rotating the imaginary i to a new locationon the imaginary unit sphere. You can send i anywhere on that sphere by a suitablerotation, so the Hopf map is onto the imaginary unit sphere.

The Hopf map is continuous: if p is close to q then H(p) is close to H(q).

The fiber over the member g of the unit imaginary sphere S2 is the collectionof all quaternions on the unit sphere S3 which move i to g. This collection of unitquaternions will be denoted φg and is nonempty.

If g = p i p∗ = q i q∗ then p∗ q i = i p∗q; that is, p∗q commutes with i. That willhappen if and only if p∗q is of the form a + b i where a2 + b2 = 1 (because both qand p are unit quaternions.)

So if we find any p for which H(p) = g then the fiber φg over g will be the circleof unit radius inside S3 given by

φg = { p (cos(θ) + i sin(θ)) | 0 ≤ θ < 2π } = { p ei θ | 0 ≤ θ < 2π }.These fibers do not overlap: fibers are either equal or disjoint. They form what iscalled a partition of S3.

We are going to select one member of each fiber.

Define s(−i) = j. For other members g = g1 i+ g2 j + g3 k of S2 define s(g) by

s(g) =1√

2(1 + g1)( i (1 + g1) + j g2 + k g3 ) .

We have s(−i) i (s(−i))∗ = j i (−j) = j (−k) = −i so j is in the fiber for −i.For other g targets,

s(g) i (s(g))∗

=−1

2(1 + g1)( i (1 + g1) + j g2 + k g3 ) i ( i (1 + g1) + j g2 + k g3 )

=−1

2(1 + g1)(− (1 + g1)− k g2 + j g3 ) ( i (1 + g1) + j g2 + k g3 )

= i g1 + j g2 + k g3.(Multiply and use 1− g21 = g22 + g23 .

)We have now an explicit formula for the fibers:

φg = { s(g) ei θ | 0 ≤ θ < 2π }.

Recalling the goal of this section’s introductory remarks, we have shown that S3can be broken into disjoint copies of S1 (the fibers) each of which corresponds toone point of S2.

The function s we used to select a point in each fiber is continuous everywhereexcept −i, the point antipodal to the quaternion i in S2 which we used to defineour fibers.


By choosing any member a of S2 we could create a new Hopf map and a newcollection of fibers consisting of disjoint unit circles in S3, each circle composed ofpoints p so that p a p∗ = g for each g in S2. But the selection function analogousto our function s would also fail to be continuous, this time at −a.

It is an interesting and important feature of S3 that no continuous selection of amember of any partition of S3 into disjoint circles can be found, a fact which leadsto the roots of a subject called algebraic topology.

23. (*) Matrices and Quaternions: a Comparison

We will estimate the complexity of doing common rotation calculations in R3,comparing the efficiency of rotation matrices to quaternions as the “operators” onvectors. We will count the number of real number multiplications required to (i)Rotate a vector. (ii) Calculate an axis and angle of rotation from the operator.(iii) Find the rotation operator given its axis of rotation and angle of rotation. (iv)Combine two operators into a third operator.

We will not count additions or other operations, or data storage requirements, orcase-testing operations, or trig function look-ups, or the proliferation of side issuesand special cases (all of which hurt the argument for using matrices) so this reallyis only a rough look at the issue.

(i) Using 3×3 matrix M on vector v to calculate the rotated vector M v requires9 multiplications.

Using unit quaternion q to calculate q v q∗ on imaginary v requires two steps. Thefirst, calculate q v, requires 12 multiplications. Since we know the result of (q v) q∗

will end as imaginary, we don’t need to calculate the real term in the second step,so 12 more multiplications are needed.

In view of the formula at the end of Section 19, converting a quaternion tothe corresponding matrix operator takes 9 multiplications, so by going through thematrix operator we can reduce the number of multiplications to 18.

The score: matrices 9, quaternions 18. Matrices are the big winner here.

(ii) To calculate the axis and angle of rotation from the quaternion operatorrequires one arccos lookup and no multiplications. If you insist on normalizing theaxis vector it requires 6 multiplications and a square root operation.

On the other hand, matrices require more effort. Using the formula at the endof Section 13, we need 3 multiplications and a square root operation and then atrig function lookup to find the angle. To normalize the axis vector requires 3 moremultiplications.

So 0 or 6 multiplications for quaternions versus 3 or 6 multiplications for matri-ces.

(iii) To find the quaternion rotation operator given a normalized axis of rotationand angle of rotation requires two trig function lookups and 3 multiplications. Ifyou need to normalize the axis vector that jumps to 9 multiplications.

To produce the matrix operator, the easiest thing to do is use the formula at theend of Section 19, which requires going through the quaternion representation. So

42 LARRY SUSANKA

there are two trig function lookups and either 3 or 9 multiplications to get there,plus an extra 9 multiplications to create the 9 matrix entries. So it will take 12 or18 multiplications.

18 versus 9 or 12 versus 3, depending on if you need to normalize the axis vector:quaternions win again.

(iv) To combine two quaternion operators into a third operator requires 16 mul-tiplications. To multiply 2 matrix operators requires 27. Quaternions are better atthis.

Quaternions MatricesRotate Vector: 18 9

Find Angle/Axis from Operator: 6 (or 0) 6 (or 3)Find Operator from Angle/Axis: 9 (or 3) 18 (or 12)

Combine Operators: 16 27

24. (*) Quaternion Interpolation

We saw in Section 23 that quaternions seem to be clearly better at performingtwo out of four of the tasks we examined, and tied or better at one task, thoughthe analysis was pretty superficial. Matrices came out ahead on one task.

But now we come to a different issue, interpolating between two operators,as one would want to do for instance when changing the orientation of a spacecraftor an airplane, or rotating an animated figure in a video game.

The problem with matrix rotation operators is that doing any of the obviousmanipulations to transform an initial rotation matrix to a final rotation matrixpasses out of the orthogonal matrices, and no inexpensive “normalization” processreturns the columns to an orthonormal set of vectors. Rotating one “embedded”axis at a time around predetermined standard axes to its final position (usingEuler angles7, for example) results in jerky and unnatural motion and is subject toa phenomenon called “gimbal lock” where the procedure can break down entirely.

Quaternion interpolation, on the other hand, is relatively easy to describe andunderstand, and is not subject to these issues, and is numerically stable in the sensethat roundoff and other errors remain under control during the interpolation.

And this is why quaternions have become popular in recent years with con-trol theorists (robotics), aeronautical engineers, video software designers and otherfolks who actually want to perform calculations involving objects with changingorientation, as distinct from proving theorems about them.

Suppose ps, pe, qs and qe are unit quaternions, ps 6= −pe and qs 6= −qe. Supposealso that a and b are any continuous increasing real functions defined on the unitinterval with a(0) = b(0) = 0 and a(1) = b(1) = 1.

7See Jack Kuipers: Quaternions and Rotation Sequences, Princeton University Press, 1999,or Andrew Hanson: Visualizing Quaternions, Elsevier, 2006, for more on the various possibilities

here.


Then the two quaternion valued functions

P (t) =a(t) pe + (1− a(t)) ps‖ a(t) pe + (1− a(t)) ps ‖

and Q(t) =b(t) qe + (1− b(t)) qs‖ b(t) qe + (1− b(t)) qs ‖

interpolate continuously between ps and pe, and qs and qe, respectively, in the unitquaternions. (The most obvious choices for a and b are a(t) = b(t) = t.)

Specifically,

‖P (t)‖ = ‖Q(t)‖ = 1 for all t in [0, 1]

and

P (0) = ps, P (1) = pe, Q(0) = qs, and Q(1) = qe.

P and Q are confined to the subspaces of H generated by ps and pe and by qsand qe, respectively.

If v is an imaginary quaternion and w is any quaternion then

P (t) v P (t)∗ and P (t)wQ(t)

are continuous paths in the quaternions, starting at ps v p∗s and ps w qs and ending

at pe v p∗e and pe w qe, respectively. The norms of these quaternions remain constant

along the path. And, of course, these quaternions correspond to vectors rotatingfrom one position to the other in R3 and R4, respectively.

A discrete approximation to this continuous model of rotation would be ob-tained by selecting times ti in [0, 1], i = 0, . . . , n with

0 = t0 < t1 < t2 < · · · < tn = 1

for which consecutive P (ti) and Q(ti) are sufficiently close together for the pur-poses at hand. Using the method of interpolation outlined above, guarantee of“sufficiently close” may be an issue.

Even so, this simple formula has no counterpart that can be discussed in 4or 5 paragraphs—or 4 or 5 pages—using matrices without reprising the ideas wehave introduced involving quaternions. The option of creating a linear interpo-lation between starting matrix and ending matrix, and performing Gram-Schmidtorthonormalization at each step from start to finish is computationally unappealing.

The “video animation community” refers to these quaternion interpolation tech-niques by the quasi-acronym slerp, short for spherical linear interpolation, inreference to the fact that the quaternions that implement these rotations lie on theunit sphere S3 in H.

There is another way of interpolating between unit quaternions that has someserious advantages.

If ps and pe are unit quaternions, so is h = p∗s pe.

In polar form, h = cos(θ) + sin(θ) U for argument θ and unit imaginary U, andU can be chosen so that 0 ≤ θ ≤ π.

So log(h) = ln(‖h‖) + θU = θU.

The function

R(t) = ps ( p∗s pe )t

= ps(ht)

for t ∈ [0, 1]

satisfies R(0) = ps and R(1) = pe.

44 LARRY SUSANKA

Also, since ht = et log(h) = et θU = cos(t θ) + sin(t θ)U we have

R(t) = ps(ht)

= ps (cos(t θ) + sin(t θ) U) .

It is apparent that R(t) interpolates from ps to pe through unit quaternions.

The restriction that ps 6= −pe is lifted: θ = π and U can be any choice of unitimaginary in this case.

This interpolation proceeds at constant speed |θ| so for each i

‖R (ti+1)−R (ti) ‖ ≤ |θ| | ti+1 − ti | .

25. (*) The “Belt Trick”

The fact that each rotation R in R3 can be implemented by two different unitquaternions q and −q through

R(v) = q v q∗ and R(v) = (−q) v (−q∗)has interesting and, at first glance, rather puzzling consequences both sublime (par-ticle physics) and mundane (macroscopic objects.)

Let’s suppose that we are going to rigidly rotate the standard coordinate basisin R3, passing through intermediate steps in a continuous way, and after a period oftime, say 1 minute, we end up back where we started: at the standard coordinatebasis.

We envision this process to be determined by a continuous unit quaternion valuedfunction Tt for t in [0, 1] where

Tt = cos (θt) + sin (θt) Ut

for continuous real function θt and continuous unit imaginary quaternion valuedfunction Ut defined, themselves, on the unit interval.

So the continuous rotation we intend is implemented by

Rt(v) = Tt v T∗t for v in R3 and t in [0, 1].

Given our conditions both R0 and R1 must be the identity map: no visiblerotation from the standard basis at all. We may choose θ0 to be 0 and T0 to be1 = cos (0) + sin (0) U0. But then there are two possibilities for T1, namely

T1 = 1 = 1 + 0 U1 or T1 = −1 = −1 + 0 U1.

The first case corresponds to θt ending at an even multiple of π while the secondcase has θt ending at an odd multiple of π.

Remember that a unit quaternion at angle θ causes a rotation by angle 2 θ aroundits imaginary part, so in both cases we end up at a rotation by an even multiple ofπ, both apparently equivalent (at least to the eye) to no rotation at all.

In the first case, using the interpolation techniques of the last section for eachfixed t if you wish, the path through the unit quaternions represented by T can becontinuously deformed to the constant function 1 while leaving both of the endpoints,T0 and T1, invariant. In the second case this cannot be done.


The remark concerning the second case is obvious: you cannot continuouslychange −1 to 1 (or 1 to −1) without leaving the quaternions that correspond to theidentity, thereby violating the condition that R(0) = R(1) and that both remainthe identity during the transformation.

Here is a simple physical scenario that demonstrates the difference between thetwo cases: the belt trick. There are several variations, based on the same principle.

Take a belt and nail its tip to the floor (or step on it.) Grasp the belt and pull itstraight so you are looking down along its length to the floor, your eye gazing uponthe flat surface of the belt. Imagine many parallel copies of the three coordinateaxes running up the spine of the belt with each y axis pointing generally up towardsyou as you look down the length of the belt from above, each z axis pointing outand away from the surface of the belt and each x axis pointing to the left.

Now give the belt a 720◦ twist along its length.

The axes affixed to the belt at various places along its length are rotated aroundthe y axis from their original position by various amounts, but the axes at the floorand the axes gripped between your fingers at the top are in their original position.

The amount of rotation along the belt is a function of position along the beltand represents a path through the unit quaternions that starts and ends at the unitquaternion 1.

Moving around in various ways but not rotating the axes in your fingers or thoseat the floor you can straighten the belt.

Try it yourself, and try to visualize the changing path through the unit quater-nions (as a function of position along the length of the belt) as it transforms fromthe unit circle in the 1, j plane to the constant 1 function.

If the belt is twisted only 360◦ you cannot untwist the belt, no matter what youdo, without rotating the axes at the floor or in your fingers or tearing the belt.

An argument of this kind is called topological and using it we drew powerfulphysical conclusions without doing complicated calculations.

Index

Dpf(q), 28

Proju⊥ , P roju, 6

Ru, 6

~e1,~e2,~e3,~e4 (standard basis in R4), 32~i,~j,~k (standard basis in R3), 16d fdU

(q), 27dfdt

(c), 28C, 10

H, 18

HU, 21

O, 38

Sn, 21z, 11

φg , 40

log(z), 13, 26

q, 39

‖ · ‖, 11

arg(q), 11, 13, 26

ez , 13, 26f ′(z), 14

i (complex numbers), 10

i, j, k (quaternions), 18

pq , 13, 27

q∗, 20

u⊥, 6

algebra, 2

anti-commutative, 19

argument, 11, 26

associative algebra, 2

belt trick, 45

Cartan-Dieudonne theorem, 7

Cayley-Dickson construction, 38commutative algebra, 3

complex numbers, 10

conjugate, 11, 20, 39

Conway, John, 9

De Moivre’s formula, 13

derivative, 14, 27directional, 28

Gateaux, 28

quaternion valued path, 28

discrete approximation (interpolation), 43

division algebra, 3

Euler’sequation, 13rotation theorem, 18

exponential, 13, 26

fiber over g, 40

Frobenius’ theorem, 3

Gateaux derivative, 28

Graves, John T., 38

Hamilton

W. R., 18

product, 18Hanson, Andrew, 42

Hopf

fibration, 39map, 39

Hurwitz’s theorem, 4

idempotent, 3

imaginary, 10, 19

inversion, 22, 23, 32isometry, 4

Kuipers, Jack, 42

left-handed, 16

logarithm, 13, 26

Mazur-Ulam theorem, 6

norm, 4, 11, 20, 39

octonion, 38

partition, 40Pauli

spin group, 31

spin matrices, 30polar form, 11, 14, 21

power series, 12, 25

projection, 6

quaternion, 18

group, 31

real, 10, 19

reflection, 6right-handed, 16

rigid motion, 16

Rodrigues’ formula, 25

sedenion, 39slerp, 43

slice, 21Smith, Derek, 9

sphere, 21spherical linear interpolation, 43

standard basis vectorsof R3, 16of R4, 32

unital, 3

versor, 20

zero divisors, 3

46

an introduction to quaternions with application to rotations · an introduction to quaternions with...

Documents