birnie

Exponential Maps for Computer Vision

Nick BirnieSchool of Informatics

University of Edinburgh

1 Introduction

In computer vision, the exponential map is the natural generalisation of the ordinary ex-ponential function to matrix elements. The technique is based on generating a manifoldembedding of the geometric features of the scene on which to estimate trajectories primarilyof motion or invariance. An advantage of using the exponential map is the existence of aclosed form time-update equation for the state.

2 Definition

The most natural definition of the exponential map arises in the study of Differential Ge-ometry as the generalisation of the exponential function. Consider the solution of a linearordinary differential equation of the form

f(t) = Lf(t) =⇒ f(t) = eLtf(0)

in which L is a linear operator. The exponentiation follows from the regular series expansionwhere f is a scalar valued function. However, for vector valued f , e takes on the character-istics of the exponential map.

This concept is readily applicable in Lie Theory - the study of groups that form differen-tiable manifolds - where it generalises the exponential function to the infinitesimal elementsof Lie Groups. This section formalises both definitions.

2.1 Differential Geometric Definition

Let M be a differentiable manifold and p a point on M . Let TpM denote the tangent spaceto M at p. Then for a vector v ∈ TpM , there exists a unique geodesic γ 7→ M , such thatγ(0) = p and γ′(0) = v. The exponential map of p, is then defined as expp(v) = γ(1), i.e.the exponential map is the point reached by the transport of γ.

1

2.2 Lie Theory Definition

Let G be a Lie group1 and g be its Lie algebra2 (see [3] for a formal definition of these terms).Several definitions of the exponential map are possible. In computer vision, the most naturalis the special case where G is a matrix Lie group. The exponential map is simply defined tocoincide with the matrix exponential series expansion.

exp(X) =∞∑k=0

Xk

k!= I +X +

1

2X2 +

1

6X3 + · · ·

Another definition is possible which is directly equivalent to the differential geometricdefinition above. Redefining γ as a one parameter subgroup, with elements determined by avector v ∈ g, directly converts between geodesics and groups.

3 Rigid Body Tracking

In 3D computer vision, the problem of tracking an object in video is typically addressedby maintaining a transformation for each of the object’s degrees of freedom. The result isan estimation of the 3D pose with reference to several coordinate frames. Simplificationof the problem is possible when rigid body transformations are considered. The reason forthis is that such transformations preserve the distance between any two points. This allowstracking an object by a single transformation of its coordinate frame.

It is necessary to prevent reflections in the transformation in order to track a solid ob-ject through space. A constraint is imposed to exclude those transformations which preservedistance but reflect in a particular plane. Therefore, only transformations which preserve ori-entation are considered. A more formal definition is possible, where-by the Special Euclideantransformations are required to preserve the norm and cross-product of two vectors.

A pair of Cartesian coordinate systems are then required to specify the position of theobject relative to the camera. The object coordinate frame is relative to a fixed referencepoint, known as the world coordinate frame. One may then define a point on the camera,attach an axis, and track the displacement from the world frame. Equivalently, one may alsoallow the object frame to vary keeping the camera frame fixed. The key idea is maintaininga transformation relating the object and world frames. An illustration of the geometricstructure is presented in Figure 1.

Relating the object and world frames is a form of restricted affine transformation. Specif-ically, only composed of two components; a rotation and a translation. Next, derivation ofthese is handled in turn, with two purposes. (a) Representation in homogeneous coordinates.(b) Representation as a parametric model.

1Informally, a Lie group consists of infinitesimal elements with the property that it is also a differentiablemanifold.

2Again informally, the elements of g compose the tangent space to G at the identity element.

2

h

Figure 1: Transformation, g, of a camera frame, C (x,y,z), relative to the world frame, W(X,Y,Z). (Source [3])

3.1 Exponential Representation

The rotational component can be developed independently of the translational part. Aproperty in common with all rotation matrices is that the transpose is equal to the inverse.

RwcRTwc = I

A family of matrices satisfying this property is known as the orthogonal matrices. Theyform a group O(3) under the group operation of matrix multiplication. An additional con-straint is imposed to limit the group to be only orientation preserving matrices. Specifically,the requirement is that the determinant equals +1. The terminology is similar, denotedSO(3), for the special orthogonal matrices.

SO(3) , {R ∈ R3×3 |RRT = I, det(R) = +1}

Although the number of parameters is potentially 3× 3 = 9, the constraint imposed byRRT = I implies that only three of these are free, which equals the dimensionality of thespace of rotation matrices. A parametric representation for rotation matrices has thereforebeen developed.

A continuous map R(t) : t 7→ SO(3) is defined representing a rotational trajectoryof an object relative to the world frame. The rotational velocity R(t)RT (t) can then berepresented as a skew-symmetric matrix M ∈ R3×3. A 3-vector ω is defined containing thefree parameters of the rotation matrix, and ω = M is added to the notation. The tangentspace to SO(3) at the identity element is the space of skew-symmetric matrices, also knownas its Lie Algebra.

so(3) , {ω ∈ R3×3 | ω ∈ R3}

Now attention shifts to translation matrices in three dimensions, and how these can beviewed in a parametric form. By extracting the difference terms from the fourth column, aparametric model of three parameters is formed. A complete rigid body motion is specified

3

by a translation and a rotation matrix. These can be written together in block from, asfollows

g =

[Rwc Twc

0 1

](1)

This allows complete representation of a rigid body motion. It follows immediately thatthe number of parameters is six. Together, the collection of all these matrices is precisely thegroup of orientation preserving Euclidean transformations, SE(3), for the Special Euclideangroup.

SE(3) ,

{g =

[Rwc Twc

0 1

]|Rwc ∈ SO(3), Twc ∈ R3

}Based on the homogeneous transformation, encapsulating the complete rigid body mo-

tion, it is possible to represent the position of the object at the next time step in a matrixexponential form. The tangent space is given by the following Lie algebra

se(3) ,

{ξ =

[ω v0 0

]| ω ∈ so(3), v ∈ R3

}where v is defined as T (t) − ω(t)T (t). The intuitive explanation for this is the effect onthe translational velocity by applying a rotation. Such a translation is known as a twist.Together the first row of the matrix ξ defines the unique geodesic in the direction of thetangent vector at g(t), as introduced in §2. The tangent vector to the geodesic is given bythe following ordinary differential equation.

g(t) = ξg(t)

The solution of which is given by

g(t) = eξtg(0)

where eωt is the matrix exponential, obtained from the exponential series expansion

eξt =∞∑n=1

(ξt)n

n!

Hence, assuming R(0) = I, the exponential map can now be defined

exp : se(3)→ SE(3); ξt 7→ eξt

4 Bibliographic Notes

The content of §3 was composed following meticulous study of [3], a sample chapter of whichis available from the book’s website. In particular, Figure 1 should be noted to originate

4

from those authors, together with the notational quirks and the derivation of the exponentialmap. Much more detail on the exponential representation is available in the book. A pointof interest is the logarithmic map, which is given by the right inverse of the exponential map.

Applications in motion tracking first appear in [1]. The authors represent the kinematicchain of humans as a product of exponential maps and produce a differential model of motion,showing comparable performance to previous methods.

Another application is statistical shape estimation, where the computation of geometricinvariants from image data is performed. Fletcher et al [2] generalised Principal ComponentsAnalysis to operate on the non-Euclidean geometry of Lie groups, and named the methodPrincipal Geodesic Analysis. In contrast to motion tracking, the authors use the medialrepresentations as Lie group elements. An algorithm is then given for computing the basisvectors (which are geodesics on the Lie group).

References

[1] C. Bregler and J. Malik. Tracking people with twists and exponential maps. In Pro-ceedings of the IEEE Computer Society Conference on Computer Vision and PatternRecognition, CVPR ’98, pages 8–, Washington, DC, USA, 1998. IEEE Computer Soci-ety.

[2] P. Thomas Fletcher, Conglin Lu, and Sarang Joshi. Statistics of shape via principalgeodesic analysis on lie groups. Computer Vision and Pattern Recognition, IEEE Com-puter Society Conference on, 1:95, 2003.

[3] Yi Ma, Stefano Soatto, Jana Kosecka, and S. Shankar Sastry. An Invitation to 3-DVision: From Images to Geometric Models. SpringerVerlag, 2003.

5

birnie

Documents