pose and motion estimation of free-flying objects

1

Pose and Motion Estimation of Free-Flying Objects:

Aerodynamics, Constrained Filtering, and

Graph-Based Feature TrackingMatthew Gardner, Member, IEEE and Yan-Bin Jia, Senior Member, IEEE

Abstract—We investigate the problem of dynamically estimat-ing the instantaneous position, orientation, velocity, and angularvelocity of an arbitrarily shaped object during its free flightbased on image frames taken simultaneously by two high-speedcameras. Aerodynamic effects including drag, lift, and Magnusforces are modeled to describe the object’s flight. Observablesare derived from combining dynamics with a camera projectionmodel assuming two-view geometry, via the use of multiplequaternions. The state of the composed system can then beestimated via constrained Kalman filtering, to which a solutionis presented for the case of multiple quadratic constraints. Tokeep track of appearing and disappearing visual features duringflight, the estimation algorithm employs a graph matching-basedtechnique to maintain a set of evolving hypotheses throughevaluation, pruning, and addition. Experiments conducted overvarious objects have either provided validation against mo-tions independently estimated using multiple accelerometers, orformed verification by matching flight images against projectionsbased on the state estimates.

Index Terms—Estimation, aerodynamics, Kalman filtering,constrained optimization, image graph, feature tracking.

I. INTRODUCTION

Motion estimation of free-flying objects is a challenging

problem in multiple areas of robotics. Pose and motion esti-

mates are necessary for sensing obstacles, planning actions,

and navigating autonomously in dynamically changing en-

vironments. In space, tumbling objects such as debris and

asteroids are tracked using vision, with motions estimated via

Kalman filtering [1] and optimization [2]. Flight trajectory

estimation can help space robots better collect space debris [3],

[4], service orbiting satellites [5], and make maneuvers during

space exploration [6], [7]. For an object flying in the air, the

ability to obtain accurate estimates of its pose and velocity

over a brief duration allows for a robot to perform skillful

maneuvers such as catching [8] and batting [9], [10], [11].

In sports, such estimation is also used to track a ball for the

purpose of human training [12]. In manufacturing, a potential

application of motion estimation could be enabling robotic

arms to pass objects down an assembly line through catching

and throwing motions. Other applications include projectile

tracking, robotic underwater exploration, and virtual reality.

Support for this research was provided in part by the US National ScienceFoundation under Grant IIS-1421034. Any opinions, findings, and conclusionsor recommendations expressed in this material are those of the authors anddo not necessarily reflect the views of the National Science Foundation.

The authors are affiliated with the Department of ComputerScience, Iowa State University, Ames, IA 50010 USA (e-mails:mattga,[email protected]).

Measurement of position and linear velocity has been used

for a range of tasks. The position trajectory of a flying object

was estimated from a sequence of monocular images for object

tracking [13], by stereo cameras for a high-speed ping-pong

robot [11], [14], and by two high-speed cameras on pan-tilt

mechanisms for batting a ball [15]. As the complexity of

robot applications has grown, estimation of angular velocity

has become increasingly necessary. Measurements by linear

accelerometers placed inside a body along some orthogonal

axes were used to calculate the body’s angular velocity via

solution of a system of kinematic equations [16] or through

optimization [17], [18]. A magnetometer, accelerometer, and

angular rate sensor were employed to obtain quaternion-based

measurements, which were then supplied to a Kalman filter

for polishing [19]. These techniques, however, were invasive

as they relied on sensors mounted inside the object’s body.

In addition, none of them addressed how to estimate the

object’s linear velocity. Meanwhile, vision-based estimation

of both linear and angular velocities was investigated for

object tracking [20], [21], but without the temporal and spatial

constraints of the object being in free flight.

Vision sensors such as RGB cameras are widely accessible

and easy to work with, making them ideal for the estimation

problem. They can be made small enough to serve as eyes for

a humanoid robot. Can fast and accurate motion estimation of

a flying object be achieved using cameras alone? Here we put

an emphasis on the object in a free flight with no actuation.

Several challenges immediately arise:

• images of the object lack depth information,

• the object moves a large distance over a small duration,

• images are susceptible to noise due to the fast movement.

Some vision systems have addressed these challenges. Among

the best known are motion capture systems composed of mul-

tiple high-speed cameras surrounding an object with attached

markers [22], or actuated cameras that control their pitch and

yaw for tracking [15]. Such systems are complex and often

infeasible due to limitations of cost, space, power consump-

tion, etc. Moreover, they depart from a basic preference for

estimation without attaching markers or sensors to alter the

object’s physical appearance.

A. Contributions and Outline of the Paper

In this work, we present a scheme for accurately estimating

the poses and motions of free-flying objects while remaining

robust to the effects of short flight periods, large aerodynamic

2

Constrained

Kalman FilteringGraph-based

Feature Tracking

Aerodynamics

ObservablesHypotheses

Image

Frames

Pose and Motion

Estimates

Fig. 1. Block diagram of motion estimation.

forces experienced in the case of low mass density, and image

noise due to fast movements. We will use a system which

consists of two cameras with wide angle lenses and operating

at 120 fps or higher, and is inexpensive relative to larger

motion capture systems such as OptiTrack and Vicon. Besides

achieving estimation, we make several other contributions:

a) We introduce a graph-based feature tracking algorithm

using hypotheses to deal with uncertainties and switching

of observable features.

b) We construct for the first time a solution, via equation

derivation and numerical root finding, to Kalman filtering

under multiple quadratic constraints in the general form.

c) We model aerodynamics for free flying objects in general

shapes and integrate it with vision-based estimation.

Fig. 1 illustrates the interactions among three components

of the scheme: aerodynamics, constrained Kalman filtering,

and graph-based feature tracking. Extensive experiments are

conducted with a relatively low-cost vision system, which past

efforts have not considered.

The paper is organized as follows. Section II overviews

work related to estimation that comes from multiple areas.

Section III discusses the system dynamics of a free-flying

object and computation of the aerodynamic forces affecting

its motion. Section IV introduces the camera projection model

and geometric constraints enforced by stereo cameras to pro-

duce observables from the object. Section V gives the Kalman

filter algorithm that enforces multiple quadratic constraints

on the estimated state while making use of the models from

Sections III and IV. Section VI describes a graph-based feature

tracking algorithm for identifying image-to-model correspon-

dences that are required to drive multiple Kalman filters in a

competition for the best estimate. Section VII presents results

from three experiments of increasing complexity, conducted

with a wooden frame, rugby ball, and foam polyhedron.

Section VII concludes the paper with discussions on improve-

ments and future directions for dynamic estimation.

B. Notation

A vector (or a point) is by default a column vector, written

by a lowercase bold letter, e.g., v. A unit vector has a hat, e.g.,

v = v/‖v‖. The left superscript of a point (or a vector), if

exists, denotes the frame in which it is expressed. For example,wp gives the coordinates of a point p in the world frame

(denoted by w), while bp gives its coordinates in the body

frame (denoted by b). A scalar (or vertex in Section VI) is

written as a lowercase non-bold letter, and a matrix (or graph)

as an uppercase non-bold letter. A vector function is written

in bold, while a scalar function is non-bold. The subscripts

x, y , and z of a letter (non-bold) represent the respective

x, y, and z-coordinates (or components) of a point (or a

vector) named by the same letter (bold). For example, pxdenotes the x-coordinate of the point p. Multiplication of two

quaternions uses the operator ⊗. In a slight abuse of notation,

multiplication of a quaternion q and a vector v, i.e. q ⊗ v,

implies that v is represented as the pure quaternion (0,v⊤)⊤.

In addition, the superscripts ‘−’ and ‘+’ are used to refer to

the prior and posterior estimates produced by a Kalman filter.

II. RELATED WORK

Our effort draws upon works from a diverse set of related

fields: aerodynamics, feature tracking, Kalman filtering, and

simultaneous localization and mapping (SLAM).

A. Aerodynamics

Aerodynamics are known to play a key role in affecting a

flight motion. Calculation of forces such as air drag and lift

results in more accurate estimates, as done for an aircraft’s

motion using a downward-facing camera [25]. In robot table

tennis, the effects of drag and Magnus were incorporated into

estimation of a ping-pong ball’s motion with varying degrees

of spin [26], [27]. Unfortunately, closed forms for lift and

drag forces so far exist only for well studied shapes such

as a ball and airfoil [28, pp. 258–262, 349], and they do

not generalize to irregular shapes such as a polyhedron. The

estimation scheme in this paper is described for polyhedral

objects whose topological structures can be represented by

planar graphs, achieving a degree of generality as many objects

can be approximated in this way. Note that machine learning

techniques were used to produce models of arbitrarily shaped

objects in flight under the effects of aerodynamics and other

non-linear dynamics [22], based on training over numerous

flight trajectories from laborious hand throws.

B. Feature Tracking

Coupled with motion estimation is the problem of tracking

an object’s features in its changing images in order to generate

observables for setting up an estimation algorithm. Disparities

between features from observation and those from prediction

are used to estimate the object’s pose via weighted least

squares [29], [30], non-linear minimization [31], or statistical

modeling [32], [33], [34]. Model-based pose tracking was

applied to estimate the pose of a spacecraft with different

ways of initialization [35]. In general, these approaches, by

minimizing measurement errors, led to frame rates (up to 30Hz) that were too low for high-speed estimation, and also

suffered from erratic performance in case of large disparities

between observed and predicted features.

To make use of the object’s topological structure, match-

ing was conducted between a “projection” graph constructed

from images with a model graph, where edges and vertices

contained attributes pertaining to features [36]. Approximate

3

graph matching algorithms were developed to produce sub-

optimal solutions to object recognition, while remaining tol-

erant to measurement errors and image noise [37], [38].

Recently, deep neural networks were trained on synthetic data

to track stationary objects with occlusions [39], [40].

C. Kalman Filtering

For motion estimation, the Kalman filter [41] remains the

golden standard due to its effectiveness at smoothing out

estimates in the presence of noisy measurements, as well as

its computational efficiency. An extended Kalman filter (EKF)

using dual quaternions closely calculated the constant velocity

of an object from its images taken by a single camera [21]. A

modified EKF was able to estimate the rigid body dynamics

of a moving object by exploiting the epipolar constraint [25],

[42], [43]. For localization and navigation of air and ground

vehicles, visual inertial navigation systems (VINS) also made

use of the epipolar constraint in the measurement models

of a linear Kalman filter [44], an unscented Kalman filter

(UKF) [45], and an EKF [46]. In [47], an alternative to

the use of epipolar constraints was proposed. This approach

derived a measurement model for every observed static feature

to express geometric constraints over its observing cameras.

State correction in the EKF was then based on a range space

projection of measurement residuals over all features, which

were independent of errors in the feature coordinates.

Unit quaternions, used for representing orientation (or at-

titude), cannot be updated through vector addition by a con-

ventional Kalman filter because they are only closed under

operations (i.e., multiplications) of the Lie group SO(3) with

its elements represented by quaternions. To maintain the

invariance of the rotation representation, the multiplicative

extended Kalman filter (MEKF) from [48] first performed

an EKF style update on a state that included the position

and velocity components, as well as some unconstrained

parameters. Then, on orientation update, these parameters were

used to generate through normalization a unit error quaternion

that measured the difference between the true orientation and

the estimated one. The second-order version of MEKF was

given in [49] for three types of unconstrained orientation

parameters including Euler angles, Rodriguez parameters, and

a quaternion. The MEKF has been used widely in applications

such as attitude estimation [50], inertial integrated naviga-

tion [51], and integration with GPS [52]. Generally relying on

on-board inertial sensors, the MEKF is difficult to tune and

known to have divergence issue due to its ad hoc normalization

to maintain the unit quaternion constraint.

The invariant extended Kalman filter (IEKF), first intro-

duced in [53] for attitude estimation, leveraged the left in-

variance of the vector field describing the orientation-only

kinematics of a flying object under the multiplication action

of SO(3) with unit quaternion elements. Extended later for

velocity estimation and to right invariance on a Lie group [54],

the IEKF preserves the unit norm of the quaternion estimate, as

well as “symmetry” in the sense of not altering the dynamics of

error, whose component for orientation is in the multiplicative

form as inspired by MEKF. With a larger expected domain

of convergence than EKF and guaranteed convergence for

invariant observations [55], the IEKF has been popularized

in applications such as spacecraft attitude estimation [54] and

unmanned aerial vehicle navigation [56]. Such applications

typically employ sensors such as gyroscopes, accelerometers,

and magnetometers that output state variable values directly

(or after a Lie group action such as rotation) so the IEKF

becomes applicable. In our estimation problem, an IEKF is

inapplicable for three reasons. First, the rigid body dynamics

now includes aerodynamic effects such as drag, which is

quadratic in the velocity’s magnitude and also linear in its

normalization, and lift, which is quadratic in the angular ve-

locity. Consequently, the system dynamics no longer preserves

invariance under multiplication by members of a Lie group

such as SE(3). Second, since observations are performed by

cameras under perspective projection and radial distortion, the

equivariant property [54] for the output function does not

hold either.1 Third, the object under estimation is meant to

be natural or man-made with a small size, and it would be

unrealistic and also defeat the purpose of this estimation to

embed a gyroscope, accelerometer, or magnetometer inside

the object. In fact, the constrained EKF to be introduced in

Section IV will effectively keep the orientation estimate within

SO(3). Interestingly, a recent finding [58] has shown that,

when the state space is a Lie group, a version of the EKF

with an equality constraint is equivalent to the IEKF.

A third approach to orientation estimation is constrained

Kalman filtering which in principle can effectively deal with

a general equality constraint. In this approach, the posterior

estimate is still obtained from minimizing the variance of

the estimated error but now with the constraint incorporated

via the use of a Lagrange multiplier. This forces the state

estimate to stay on the constraint surface, e.g., SO(3) for

an orientation estimate. To our best knowledge, all existing

methods have considered either single constraint that is linear,

quadratic [59] or special quadratic (e.g., unit norm [23], gen-

eralized norm [24]), or multiple constraints that are linear [60]

or nonlinear after linearization [61]. In [23] and [24], second

order sufficient conditions were used to choose from multiple

values of the Lagrange multiplier satisfying the first order

necessary condition equation.

The gain constrained Kalman filter (GCKF) [61] was de-

rived from constrained recursive least squares minimization

of the Kalman gain matrix, which constrained weights used

to correct the state estimate. The GCKF was generalized to a

UKF with nonlinear models under linear constraints [62], and

to a norm-constrained Kalman filter (NCKF) dealing with one

nonlinear constraint on the state’s norm [23]. The NCKF was

then extended to handle a single quadratic form constraint [24].

Other approaches included a smoothly constrained Kalman

filter [63] and a two-step UKF that constrained the state’s

statistics [64].

As far as we know, none of existing works have dealt

with multiple nonlinear constraints directly without linearizing

them. In this paper, we need to handle three unit quaternion

1In visual inertial SLAM where a monocular camera was used as one of thesensors, the inventor of the IEKF and his colleagues opted for an UKF. [57]

4

constraints that correspond to the orientations of the object and

the two cameras. We will present a solution to the constrained

filtering that works for a set of general quadratic constraints.

D. Simultaneous Localization and Mapping

SLAM algorithms solve a similar problem of estimating

the pose and motion of a robot relative to its surroundings

by producing a map of features in the three-dimensional (3D)

world space. MonoSLAM used a single camera with an EKF

to estimate a robot’s pose, velocity, and angular velocity,

along with feature points in 3D [65]. Other approaches tracked

the image coordinates of features in the estimated state, and

described a projection from 3D to two dimensions (2D) in the

measurement model of an EKF [66] or UKF [67]. VINS uses

vision and inertia sensors for real-time estimation required for

robot navigation. A UKF was used to estimate the pose of

an autonomous rotorcraft with an inertial measurement unit

(IMU) and cameras [68], [69]. Stereo vision was employed

to improve angular velocity and position estimation over

traditional approaches via sensor fusion in a UKF [70].

III. SYSTEM DYNAMICS

Consider a flying object with mass m as shown in Fig. 2.

The object, with known geometry and physical properties, has

a body frame Fb located at its center of mass o and defined

by its principal axes. In the frame, the inertia tensor Q =˝

B(x ·x)I3 −xx⊤ dV is a diagonal matrix. Here, I3 is the

3 × 3 identity matrix. The rotation of Fb from some world

frame Fw is described by a unit quaternion r. Let bv andbω be the object’s velocity and angular velocity expressed

Fig. 2. Flying object observedfrom a stationary world frame Fw .A body frame Fb is located at itscenter of mass.

in Fb, respectively. The ob-

ject’s velocity in Fw is given

by a quaternion product: wv =r ⊗ (bv) ⊗ r∗ where r∗ is the

conjugate of r.

Denote by wf the non-

gravitational force exerted on

the object in the world frame

Fw. Newton’s equation takes

on the form

wv =

00−g

+wf

m, (1)

where g ≈ 9.80665 m/s2 is the

gravitational acceleration constant. Euler’s equation assumes

the form

Q bω + bω × (Q bω) = bτ ,

where bτ is the external torque in the body frame. We have

bω = Q−1(

bτ − bω ×(

Q bω))

. (2)

The object is subject to aerodynamic forces whose effects

are non-negligible if its mass density is low. Without account-

ing for these forces, it can be difficult to accurately track

the object’s state, or predict a future state for the purpose

of planning. To describe the aerodynamic forces, consider a

snapshot at a time instant during flight. The object, viewed

“stationary”, is experiencing a flow of air with velocity −bv

expressed in its body frame Fb. There are two primary

aerodynamic forces: lift, denoted bf l, and drag, denoted bfd.

Lift force acts on the object due to air particles traveling

smoothly over its surface ∂B and producing pressure forces

normal to the surface. Bernoulli’s equation [72, pp. 20–

21], after elimination of constant static pressure, gives the

following relationship between air pressure p and air speed u:

p =1

2ρu2, (3)

where ρ is air density. The total lift in the frame Fb is then

calculated by integrating this pressure over ∂B as follows:

bf l =1

2ρ

"

∂B

(

bu(e) · bu(e))

n(e) de, (4)

where bu(e) is the air velocity at a surface area element e

with its exact form yet to be determined, and n(e) is the unit

normal to the surface element. Meanwhile, the air pressure

yields a torque about the center of mass o:

bτ l =1

2ρ

"

∂B

(

bu(e) · bu(e))

e× n(e) de, (5)

Consider a neighborhood N of the object in which the

air flow is affected by its presence at the same time instant.

Under the assumption that the air flow is incompressible and

irrotational [73, p. 100], the air velocity at a point e ∈ N is

∇φ(e), where φ, known as the velocity potential, satisfies the

Laplace equation below:

∇2φ(e) = 0. (6)

Here, ∇2 is the Laplacian operator. Denote by ∂N the

outer boundary surface of the neighborhood N . The Laplace

equation (6) is subjected to two boundary conditions:

φ(e) = −bv, for all e on ∂N , (7)

∇φ(e) · n(e) = 0, for all e on ∂B. (8)

Calculation of φ involves numerically solving (6) for all the

points within the neighborhood N containing the object. This

is detailed in the supplemental material [71].

Meanwhile, within a thin boundary layer next to the object’s

surface, the viscosity of air exists to produce friction with the

surface. This leads to two effects inside the thin region [72,

p. 42]: 1) air rotating with the surface and 2) drag force. In

the former, air molecules with zero velocity relative to the

body contribute a rotational component to air velocity. This

creates the Magnus effect [72, pp. 16–33]. Combining the

component with the air velocity outside of the boundary layer

from equation (6), we have

bu(e) ≈ ∇φ(e) + bω × be, (9)

for an element e within the layer with body coordinates be.

Moreover, the boundary layer contributes two types of drag

force: pressure drag from the difference in pressure between

the front and back side of the object, and skin friction from

tangential stress on the surface [28, pp. 201–202]. Due to

5

Fig. 3. Algorithm overview for calculating aerodynamic forces and torques.

the mathematical difficulty in modeling these effects, the

following approximation is adopted [73, pp. 211–215, 339]:

bfd = −1

2ρCdA

(

bv · bv)

bv, (10)

where Cd is the drag coefficient and A is the cross-sectional

area. The approximated drag force acts through the object’s

center of mass, yielding no torque. The coefficient Cd, de-

pendent on the object’s shape, is determined from studies on

the aerodynamics of general shapes [74], [75]. To compute

the area A, all the points on the object’s surface are projected

orthographically onto a plane with normal bv. The convex hull

of their image points in the plane is constructed with its area

assigned to A.

Finally, we apply the calculated forces and torques in

Newton’s and Euler’s equations (1)–(2), where

wf = r ⊗(

bf l +bfd

)

⊗ r∗ (11)

bτ = bτ l. (12)

Fig. 3 gives a diagram of the steps to calculate aerodynamic

forces. The Laplace equation is solved given the object’s

current velocity estimate bv. The resulting potential field φ is

used along with bω to calculate the air velocity bu(e) via equa-

tion (9). Surface integration (4) and (5), based on Bernoulli’s

equation (3), are carried out to yield the lift force bf l and

torque bτ l. The drag force bfd is simultaneously computed

from the velocity estimate according to (10). Newton’s and

Euler’s equations of dynamics are then integrated to yield new

velocity estimates for the next round of calculations.

The state of the flying object is described by the 13-vector

ξ =(

o⊤, r⊤,wv⊤, bω⊤)⊤

. (13)

The state is subject to a constraint that its quaternion com-

ponent r keeps unit length, i.e. |r| = 1. Moreover, r has the

following derivative given in Appendix C of [76]:

r =1

2r ⊗ bω. (14)

This, together with o = wv, (1), and (2), forms the following

system of nonlinear differential equations:

ξ ≡ a(ξ) =

wv12r ⊗

bω

g +wf

mQ−1

(

bτ − bω × (Q bω))

. (15)

Fig. 4. A camera captures images of the object by projecting it onto theimage plane Π about its center of projection c, where the camera’s frameof reference Fc is located. The x- and y-axes of Fc are aligned with theu- and v-axes in the image plane, and the z-axis in the normal direction.The distance between c and Π is equal to the focal length of the camera.Additionally, a second camera (referred to by use of a single quote ‘′’) withcenter of projection c′ and frame of reference F ′

c is illustrated, producing thetwo-view geometry of a point p from which the rays to c and c′ intersects

the image planes at pq = (pp⊤, 1)⊤ and pq′ = (p′

p⊤, 1)⊤. The pointsc, c′, and p form the epipolar plane drawn with the normal n.

IV. OBSERVABLES FROM VISION

The object’s state ξ is estimated based on measurements

extracted out of its images taken simultaneously by two

cameras. Observables are in part produced by projection from

individual cameras, and in part constructed to accommodate

the epipolar geometry of the cameras. Parameters of camera

poses can also be included as observables to help reduce

estimation errors.

A. Camera Projection Model

A single camera observing the flying object is shown at

the bottom left in Fig. 4. At the camera’s focal point wc is a

frame Fc with its z-axis perpendicular to the image plane Π.

The rotation of Fc from the world frame Fw is described by

a quaternion rc. The plane Π has a local coordinate system

with the origin at the upper left corner of the image, the u-

axis pointing rightward, and the v-axis pointing downward.

We use the projection model described in detail by Forsyth

and Ponce [77, pp. 16–17]. Let p be a point on the object,

and denote bp,wp and cp as its coordinates in the body, world,

and camera frames, respectively. They assume the following

mappings:

wp = o+ r ⊗(

bp)

⊗ r∗, (16)cp = r∗c ⊗ (wp− wc)⊗ rc. (17)

With cp = (cpx,cpy,

cpz)⊤, the pinhole camera model yields

the normalized image coordinates

pp =

(

ppxppy

)

=1

cpz

(

cpxcpy

)

. (18)

Brown’s distortion model [78] is then used to map the pointpp to a distorted point pp:

pp = δr (pp) + δt (

pp) , (19)

6

where, denoting = ‖pp‖,

δr (pp) =

(

1 + k12 + k2

4 + k36)

pp, (20)

δt (pp) =

(

2h1ppx

ppy + h2(2 + 2 pp2x)

h1(2 + 2 pp2y) + 2h2ppxppy

)

, (21)

for some coefficients k1, k2, k3 of radial distortion, and

h1, h2 of tangential distortion. The distorted point is finally

transformed into image coordinates:

ip =

(

u

v

)

=

(

α −α cot θ0 β/ sin θ

)

(pp) +

(

uc

vc

)

, (22)

where α and β are scale factors, θ the skew angle, and

(uc, vc)⊤ the image center. Due to manufacturing inaccuracy

of cameras, all five parameters in (22) have to be approximated

to account for various sources of error.

Note that bp,wp and cp are 3-vectors, with bp known

beforehand (as the position of a feature on the object), whilepp, pp, and ip are 2-vectors. The sequence of transformations

is best summarized as follows:

bp(16)−−→ wp

(17)−−→ cp

(18)−−→ pp

(19)−−→ pp

(22)−−→ ip. (23)

Since bp is known, wp, cp, and pp are functions of the object’s

position o and orientation r, as well as the camera’s position c

and orientation rc (as a unit quaternion).

Let us add a second camera. Shown at the bottom right in

Fig. 4, this camera has focal point c′, and its body frame Fc′

has a relative orientation described by the quaternion rc′ to the

world frame Fw. We denote by c′p the coordinates of p in the

frame of the second camera, p′

p the corresponding normalized

image coordinates, p′

p the distorted point by this camera, andi′p the final image point. The transformation sequence on the

point p due to the second camera is:

wp −→ c′p −→ p′

p −→ p′

p −→ i′p.

Given ip, we recover pp by solving (22) and then pp by

solving (19). This two-step process is carried out using the

Levenberg-Marquardt algorithm for nonlinear least-squares

minimization [79], [80]. Similarly, the coordinates p′

p can be

solved for given i′p.

As illustrated in Fig. 4, two rays, originating at p on the

object and passing through the two focal points c and c′,

intersect the image planes at two points q and q′, respectively.

The five points p, c, c′, q, and q′ determine the epipolar plane.

The epipolar constraint states that, in the world frame, the five

points wc,w c′,w q,w q′, and wq are co-planar, which can be

compactly represented by the following equation [77, pp. 200]:

(

pp⊤, 1)

E(

p′

p⊤, 1)⊤

= 0, (24)

where E is a 3×3 skew-symmetric matrix with rank two [81],

[82]. Equation (24) is enforced over the camera parameters

during calibration.

Let bp1, . . . ,bpn be all the feature points on the object. Let

V and V ′ be the sets of point features that are currently visible

in the respective images produced by the two cameras. The

first type of observables are these visible features which form

the vector

ya =(

. . . , ip⊤i , . . . ,

i′p⊤

j , . . .)⊤

, (25)

for all bpi ∈ V, bpj ∈ V ′.

Every feature point wpj introduces an epipolar plane with

normal nj . Since its two image points ipj and i′pj are

subjected to noise, the corresponding epipolar constraint (24)

is not exactly satisfied for bpj . Instead, we include nj to indi-

rectly enforce the constraint. The second vector of observables

yb gathers the normals over vertices visible in both images:

yb = (. . . , n⊤j , . . .)

⊤, (26)

for all bpj ∈ V ∩ V ′. The full vector of observables is

y(ξ) =

(

ya

yb

)

. (27)

B. Augmenting the State with Camera Poses

The two cameras’ positions c, c′ and orientations rc, rc′ ,known as the extrinsic camera parameters, are approximated

offline via a calibration procedure. Their measurement er-

rors may compromise estimation accuracy during the object’s

flight. In particular, the epipolar geometry is sensitive to the

camera poses and their calibration errors. Motivated by SLAM

and VINS algorithms [46], [47], [65], we conduct online

estimation of parameters to find an agreement between a pose

estimate of the object and those of the cameras.

The state vector ξ in (13) is now augmented to include the

pose of each camera:

x = (o⊤, r⊤, wv⊤, bω⊤, c⊤, r⊤c , c′⊤, r⊤c′ )⊤, (28)

The state x has 27 variables in total. There are two reasons

for doing this: 1) to allow correction of calibration errors due

to hand measurements, and 2) to (indirectly) accommodate the

epipolar constraints. From equation (15), the system dynamics

are subsequently augmented:

x = b(x), (29)

where the derivatives of c, rc, c′ and rc′ are all zero. We also

write the vector of observables as y = y(x) despite that it

does not depend on state variables wv and bω.

V. KALMAN FILTERING WITH QUADRATIC CONSTRAINTS

Kalman filter-based motion estimation relies on the dynam-

ics model (29) to propagate the state of the object with time,

and the measurement model described in Section IV-A to

correct the state. Visual inputs are acquired at discrete time

instants. This, plus the need to cope with nonlinearities present

in both models, leads to the use of the hybrid extended Kalman

filter (EKF) [83, pp. 403–407].

Denote by xk the value of the state x, and yk the value

of the vector y of observables, all at the k-th time instant.

The size m of yk varies with the features that are currently

observable. The measurement model can be written as

yk = h(xk,νk), (30)

where the m-vector νk is the zero-mean Gaussian white noise

whose covariance is given by an m×m matrix Rk.

The EKF for the system of (29) and (30) estimates the

distribution of the state x. It first forward integrates the

7

linearized dynamics derived from (29) using the Jacobian of

b(x), which is evaluated at the posterior estimate x+k−1 from

time instant k − 1. The result is a prior estimate x−k at time

instant k. Obtain its 27× 27 error covariance matrix P−k . The

estimate x−k is then corrected to yield a posterior estimate

x+k based on the measurement residual ǫk = yk − h(x−

k ) as

follows:

x+k = x−

k +Kkǫk, (31)

where Kk is a 27 × m gain matrix to be determined next.

Using (31), we write out the 27×27 posterior error covariance

matrix P+k for x+

k in terms of P−k :

P+k = (I27 −KkHk)P

−k (I27 −KkHk)

⊤ +KkRkK⊤k , (32)

where I27 is the 27 × 27 identity matrix, and Hk is a

m×27 matrix representing the linearized measurement model,

computed as the Jacobian of h(x) at x−k . Minimization of the

trace tr(P+k ) of P+

k (which represents exactly the variance of

error in the estimate x+k ) yields the Kalman gain

Kk = P−k H⊤

k

(

HkP−k H⊤

k +Rk

)−1. (33)

With Kk determined, x+k and P+

k are then evaluated according

to (31) and (32) to conclude the updates at the k-th time

instant.

A. Constrained Kalman Filtering

When the system state is subjected to some constraints, the

unconstrained estimator (31)–(33) performs corrections that

would cause the estimate to drift away from the constraint

surface. To resolve this, the constraints need to be incorporated

into the minimization problem for determining the Kalman

gain Kk. The augmented state x in equation (28) contains

three unit quaternions r, rc, rc′ representing rotations of the

object and two cameras, respectively. These quaternions are

each seen as a unit 4-vector so they induce constraints

ηi(x) ≡ x⊤Aix− 1 = 0, (34)

for i = 1, 2, 3, where

A1 = diag(03,14,03,03,03,04,03,04),

A2 = diag(03,04,03,03,03,14,03,04),

A3 = diag(03,04,03,03,03,04,03,14).

In the above, diag(·) specifies a diagonal matrix whose main

diagonal consists of the listed arguments with 0d and 1d being

d-vectors of zeros and ones, respectively.

The aim is to modify the Kalman filter equations to include

the three quadratic constraints given in (34). The gain matrix

Kk is now obtained through constrained minimization:

minKk

tr(

P+k

)

subject to ηi(x+k ) = 0, i = 1, 2, 3. (35)

Below we will derive a solution to (35) that can be easily ex-

tended to handle multiple quadratic constraints in the general

form.

B. Solution to the Constrained Minimization Problem

Continue to let the time be fixed at instant k. With

no ambiguity, we shall temporarily omit the subscripts in

x+k ,x

−k ,Kk, ǫk, P

+k , P−

k , and Rk. Substituting equation (31)

in with x = x+, the constraints in (34) are expanded into, for

i = 1, 2, 3,

ηi(x+) = (x− +Kǫ)⊤Ai(x

− +Kǫ)− 1 = 0. (36)

We write η = (η1, η2, η3)⊤. Under (32) and (36), the La-

grangian for the constrained minimization problem (35) is

L = tr(

(I27 −H)P−(I27 −KH)⊤ +KRK⊤)

+ λ⊤η,(37)

where the vector λ = (λ1, λ2, λ3)⊤ contains three Lagrange

multipliers, one for each constraint in (36). The optimal values

of K and λ are attained at a stationary point of L.

The partial derivative of L with respect to λi simply

recovers the constraint (34), for i = 1, 2, 3. To derive ∂L/∂K,

we first obtain

∂

∂Ktr(

KRK⊤)

= 2KR, (38)

by making use of the derivative ∂tr(AX⊤)/∂X = A for

matrices A and X , as well as the symmetry of R. Next we

have, by making use of the symmetry of P−,

∂

∂Ktr(

(I27 −KH)P−(I27 −KH)⊤)

=∂

∂Ktr(

KHP−(KH)⊤)

− 2∂

∂Ktr(

P−H⊤K⊤)

= 2KHP−H⊤ − 2P−H⊤. (39)

Meanwhile, differentiation of (36) yields

∂

∂K

(

ηi(x+)

)

= 2Ai x− ǫ⊤ + 2AiKǫǫ⊤. (40)

In deriving (40), we made use of ∂(x⊤Ay)/∂A = xy⊤ for

vectors x, y, and matrix A, as well as the symmetry of Ai.

With (38)–(40) we can write out the partial derivative ∂L/∂K:

∂L

∂K= 2

(

KHP−H⊤ +KR− P−H⊤

+3

∑

i=1

λi

(

Aix−ǫ⊤ +AiKǫǫ⊤

)

)

. (41)

Vanishing of ∂L/∂K is expressed by the equation

KS +N(λ)Kǫǫ⊤ = M(λ), (42)

where

S = HP−H⊤ +R,

N(λ) =

3∑

i=1

λiAi,

M(λ) = P−H⊤ −3

∑

i=1

λiAix−ǫ⊤.

The remaining task is to solve (36) and (42) for K and λ.

First, right multiplication of equation (42) by S−1ǫ yields

Kǫ = J(λ)−1m(λ), (43)

8

where, denoting ε = ǫ⊤S−1ǫ,

J(λ) = I27 + εN(λ),

m(λ) = M(λ)S−1ǫ.

Substitute (43) into (36) to eliminate K and obtain

αi(λ) ≡ ηi(x−) + 2(x−)⊤AiJ(λ)

−1m(λ)

+ (J(λ)−1m(λ))⊤AiJ(λ)−1m(λ) = 0,

(44)

for i = 1, 2, 3. Rewrite the three equations (44) as

α(λ) = 0, (45)

where α = (α1, α2, α3)⊤. The system (45) has 3 equations

in 3 unknowns, and can be solved via root finding with a

combination of the (local) Newton’s method and the (global)

homotopy continuation method [84]. These solvers make use

of the Jacobian ∂α/∂λ, which requires the following partial

derivatives, i, j = 1, 2, 3:

∂m

∂λj

= −εAix−, (46)

∂J−1

∂λj

= −εJ−1AjJ−1. (47)

Equation (47) was derived from differentiating J−1J = I27while making use of the symmetry of J .

If k = 1, the Lagrange multipliers λ are first initialized

by solving (45) via homotopy continuation. If k > 1, the

multipliers’ values are expected to change slightly from those

at time instant k − 1. Hence, these old values can be used

as a starting point by Newton’s method to achieve quick

updates. In the case of divergence, we will fall back on

homotopy continuation. The gain matrix K is next solved from

equation (42) after substituting equation (43) on the left hand

side. The matrix is used in (31) and (32) to produce corrected

state estimates that satisfy the set of quadratic constraints.

The optimization procedure is easily adapted to work with

quadratic constraints in the general form of x⊤Aix+ b⊤i x+ci = 0, 1 ≤ i ≤ m, where Ai is not necessarily symmetric. We

need only make several changes described below. In equations

starting at (40), every appearance of Ai will be replaced with

(Ai + A⊤i )/2. The partial derivative in (40) will include an

extra summand biǫ⊤, and this change will propagate to (41)

and (42) and, subsequently, to the expression for M(λ).

VI. GRAPH-BASED FEATURE TRACKING

Motion estimation has so far assumed a fixed set of known

correspondences between point features in images and those

on the moving object. The system dynamics and observa-

tion function are assumed to have non-varying forms. Such

continuity allows the working of a single Kalman filter. The

property cannot be maintained, however, in the course of the

object’s flight as features emerge and disappear, which intro-

duces discontinuities in observations. This section addresses

the problem of identifying and tracking features of the object

in images with respect to those on its shape model. It is known

that any shape can be approximated arbitrarily well by some

polyhedron or polyhedral mesh (i.e., a triangular mesh). To

avoid an unnecessary diversion into handling of geometric

(a) (b) (c)

Fig. 5. (a) Polyedron P drawn with visible edges as solid lines and occludededges as dotted lines. (b) Image graph G constructed from a (hypothetic)image of P taken at the k-th time instant by a camera. (c) Planar graphM representing the topological structure of P . The unbounded face f0 inM corresponds to the shaded face on P . A hypothesis specifies partialcorrespondences between the vertices of G and those of M .

complexities, in this section we assume the object to be a

polyhedron P whose vertex connectivity can be represented

by a planar graph. The range of P includes all convex and

some non-convex polyhedral.

A. Image Graph and Hypothesis

The polyhedron P has n vertices p0, . . . ,pn−1. Its under-

lying planar graph is referred to as the model graph M , which

is represented by a doubly-connected edge list (DCEL) [85,

pp. 29–33]. The process turns every facet of P into a bounded

region in M , except for one facet, which is “opened up” to

become the unique unbounded region. As shown in Fig. 5(a)

and (c) for an octahedron, every vertex pi of P corresponds

with the vertex vi in M .

A frame generated by either camera is a light intensity

image of P with its faces illuminated differently and showing

discontinuities at edges and vertices. These edge and vertex

features are used to construct a planar graph G, referred to as

the image graph, as illustrated in Fig. 5(b). The features are

detected from the image using a Canny edge detector [86] and

Shi-Tomasi corner detector [87], while their correlation with

vertices and edges of P is yet to be hypothesized. We denote

〈ui, vj〉 if the vertex ui in G corresponds to the vertex vj in

the model graph M . The tracking algorithm hypothesizes a

set of vertex correspondences C(G,M) = {. . . , 〈ui, vj〉, . . .}.

This set represents a bijective mapping from a vertex subset

of G to one of M . The mean square error of C(G,M) at a

state x is defined as follows:

d(C,x) =1

|C|

∑

〈ui,vj〉∈C

‖ui − vj‖2, (48)

where ui represents the image coordinates of ui in G, and

vj represents the image coordinates of vj when the model is

projected on the image plane under the transformation defined

in x.

B. Overview of Work at One Time Instant

The two cameras are synchronized to produce images (or

“frames”) of the polyhedron at equally spaced time instants.

Let us focus on the k-th instant when two image frames

Ik and I ′k are generated simultaneously. The constructed

image graphs from the frames are Gk and G′k, which are

associated with two sets of vertex correspondences C(Gk,M)

9

and C(G′k,M), respectively. The two sets together form a

hypothesis H = (C(Gk,M), C(G′k,M)).

The two correspondence sets in H at time instant k are

first inherited from those either active at time instant k − 1,

or created at the end of processing for that time instant

(including the case k = 1 with no processing at the preceding

instant). They are then updated based on the image frames Ikand I ′

k. This update is detailed in Section VI-C. The vector

yk of observables is then updated using the correspondence

sets C(Gk,M) and C(G′k,M) from the image points of

their included vertices. In the case where either C(Gk,M) or

C(G′k,M), after its update, includes a new correspondence or

removes an existing one, the structural change is propagated to

yk as well as to the measurement model of the Kalman filter

K associated with H . Next, the filter K corrects its prior state

estimate x−k , which has evolved from the posterior estimate

x+k−1 at time instant k − 1 through the system dynamics, to

yield the posterior estimate x+k .

The updated estimate x+k is then applied to the object. With

the estimate we evaluate the error of the hypothesis:

ǫ(H) = d(C(Gk,M), x+k ) + d(C(G′

k,M), x+k ).

When ǫ(H) is high, we must decide whether to retain H in

case the error has improved from time instant k− 1, or reject

it otherwise.

At the k-th time instant, a collection H of hypotheses

(including H) are active, each equipped with a Kalman filter

responsible for estimation under the hypothesis. In the case

that all active hypotheses have been eliminated at the time

instant, new ones will be generated before moving on to the

next time instant. This is described in Section VI-D.

C. Updating a Hypothesis

Let us go back to H as one of the hypotheses active at

time instant k. It was passed on from time instant k − 1— possibly as a newly generated hypothesis. In the images

Ik and I ′k taken at the instant, vertices and edges of the

polyhedron may appear or disappear as its pose has changed

from time instant k−1. Thus, there is a need for updating the

correspondence sets. Below we focus on how C(Gk,M) is

updated from C(Gk−1,M), since C(G′k,M) can be similarly

updated.

Fig. 6. Correspon-dences used to update ahypothesis.

The first stage of the update begins

with computing the maximal correspon-

dence between Gk and Gk−1. This re-

quires some initial vertex correspon-

dences between the two image graphs,

which can be obtained during image

processing. Using the correspondence

sets C(Gk−1, Gk) and C(Gk−1,M) in-

herited from time instant k − 1, we ini-

tialize the correspondence set C(Gk,M)such that 〈ui, vj〉 ∈ C(Gk,M) when-

ever 〈uk, ui〉 ∈ C(Gk−1, Gk) and 〈uk, vj〉 ∈ C(Gk−1,M)for some uk. Fig. 6 illustrates the relationship between the

three sets of correspondences. In the second stage, C(Gk,M)is extended to become a maximal correspondence set via

mapping the image graph Gk with the model graph M .

Fig. 7. Search tree explored by the maximal correspondence algorithm onthe image graph Gk and model graph M drawn in the top left corner. Eachnon-root node represents a correspondence of two vertices from Gk and M ,or 〈u,∅〉 in the case that no correspondence is made, effectively ignoringvertex u. The pair of bolded edges in Gk and M identify the initial edgecorrespondence . Each leaf is labeled with a set C, produced by collecting allvertex correspondences along the path descending from the root to the leaf.The algorithm explores a total of 20 sets of correspondences, of which 6 areshown in the expanded subtree.

Both stages involve computing the correspondence between

two planar graphs, first Gk−1 and Gk, and then Gk and M .

This correspondence problem is one of subgraph isomorphism,

which can be done in linear time [88]. However, due to the

presence of noise in images, some tolerance to erroneous

edges and vertices must be allowed. The matching problem

becomes NP-complete, as a result of deciding whether an edge

or vertex should be considered erroneous or not. Approaches to

error-tolerant graph matching include tree search algorithms in

exponential time [36], [89], [90], and sub-optimal algorithms

in polynomial time [38], [91].

We here propose a tree search algorithm to extend

C(Gk,M). The algorithm is presented in the form of a

subroutine A(Gk,M,C0), where C0 is the initial value of

C(Gk,M). Once it is described, we call A(Gk−1, Gk, D0),where D0 is the initial value of C(Gk−1, Gk), to extend

the correspondence between Gk−1 and Gk. To also address

generation of a new hypothesis, we here let the algorithm

start with C(Gk,M) = ∅. A depth-first search of the decision

tree is performed by matching edges of Gk with different

edges of M . Fig. 7 demonstrates a portion of the search

performed on an example pair of Gk and M . The traversal

starts with {〈u0, v4〉, 〈u1, v1〉} derived from matching the

ordered edges (u0, u1) ∈ Gk and (v4, v1) ∈ M . To find

the correspondences for the vertex u2 adjacent to u1, it scans

the edges incident on v1 clockwise, yielding the branches at

the third level of the tree that represent the correspondences

〈u2, v3〉, 〈u2, v0〉, and 〈u2, v5〉. A fourth branch representing

〈u2, ∅〉 is added to include the possibility of u2 not matched.

This traversal continues until all vertices have been visited.

As a result, every leaf node reached denotes a possible

correspondence of the two graphs formed by collecting vertex

correspondences along its path from the root. For example, in

10

Hypothesis update

Dynamics

Kalman filtering

Fig. 8. Overview of the steps to update a hypothesis, estimate its correspond-ing state, and compute its error.

Fig. 7, the leaf node 〈u3, v5〉 denotes the correspondence set

C1 = {〈u0, v4〉, 〈u1, v1〉, 〈u2, v3〉, 〈u4, v0〉, 〈u3, v5〉}.

Multiple correspondence sets are often generated by the tree

search algorithm. A metric is thus needed for comparing them.

A correspondence set C1 is larger than another correspondence

set C2, denoted C1 ≻ C2, if(

|C1| = |C2| ∧ d(C1, x−) < d(C2, x

−))

∨

(|C1| > |C2| ∧ d(C1, x−)− d(C2, x

−) < δ),(49)

for some small threshold δ > 0. If neither C1 ≻ C2 nor

C1 ≺ C2, then C1 = C2. The maximum correspondence set

C∗ is chosen, i.e., we have C(Gk,M) = C∗. In case of a tie,

only one is chosen.

Similarly, we update the correspondence set C(G′k,M)

based on the image graph G′k generated by the second camera.

Update of the hypothesis H is then completed for the time

instant. The Kalman filter is applied as described in Section V

to generate the posterior estimate x+k . Figure 8 gives an

overview of the steps described thus far for updating and

evaluating a hypothesis.

Images of the flying object are subject to noise that may

lead to erroneous edges and vertices appearing. To cope with

this and provide some tolerance, the hypothesis H is rejected

if ǫ(H) ≥ Γ in each of the l most recent time instants. Proper

tuning of Γ and l helps the algorithm settle on a small set of

low-error hypotheses while subjected to noise that periodically

appears in the image.

D. Generating New Hypotheses

If k = 1 or all hypotheses have been rejected based on

the posterior estimate x+k , the algorithm attempts to generate

a new set of hypotheses. Start with a correspondence set

C0 = {〈u1, v1〉, 〈u2, v2〉} which involves adjacent vertices

u1 and u2 in Gk and v1 and v2 in G′k. A maximum

correspondence set C(Gk, G′k) is constructed using the tree

search subroutine A(Gk, G′k, C0). Next, by matching the edge

(u1, u2) with an edge of M , the same algorithm finds all

maximal correspondence sets C1, . . . , CN between G and Mwhen the edge 〈u1, u2〉 is matched with the edge 〈v1, v2〉.For each Ci, i = 1, . . . , N , we construct a correspondence

set C ′i between G′

k and M such that 〈u′, v〉 ∈ C ′i whenever

〈u, u′〉 ∈ C(Gk, G′k) and 〈u, v〉 ∈ Ci for some vertex u. A

hypothesis Hnew = (Ci, C′i) is thus obtained. To evaluate its

error ǫ(Hnew), we need a new state estimate x+k of the polyhe-

dron at time instant k. The estimate is composed of the object’s

position o and orientation r initialized under the hypothesis

Hnew, and velocities wv and bω chosen depending on the time

instant of the hypothesis generation. 2 If ǫ(Hnew) < Γ, the

hypothesis is added along with its Kalman filter initialized by

x+k . The above procedure repeats by matching (u1, u2) with

other edges of M at its start, returning any hypotheses with

low errors.

With the new hypothesis Hnew = (C(Gk,M), C(G′k,M)),

we can initialize the object’s pose which, together with ve-

locity guesses, constitute the new posterior state x+k at the

k-th time instant. Identify all pairs of corresponding vertices

〈ui, u′j〉 from Gk and G′

k under Hnew. Back-project their

image coordinates into world coordinates, which can then be

used to recover the transformation from the body frame to the

world. This is detailed in Appendix A.

VII. EXPERIMENTS

To demonstrate the accuracy and robustness of the presented

estimation scheme, experiments were conducted with three

different objects: a wooden frame, rugby ball, and foam

polyhedron. The first experiment was conceived to have a

source of ground truth available, while the next two involved

more practical objects. The rugby ball experiment addressed

motion estimation in sports where objects tend to have smooth

surfaces and experience large Magnus forces. The polyhedron

experiment aimed at a generally shaped object, and had

the objective of demonstrating estimation with non-fiducial

features.

As summarized in Table I, various compositions of the three

basic modules (aerodynamics, Kalman filtering, and graph-

based feature tracking), with increasing complexity, were

validated. Five different estimators were used: an extended

Kalman filter (EKF), unscented Kalman filter (UKF) [92],

quadratically constrained Kalman filter (QCKF) described in

Section V-B, a QCKF with aerodynamics approximated for

a sphere (QCKF-S), and a QCKF with aerodynamics fully

modeled (QCKF-A). All five estimators used the dynamics

model (15) (with the lift force f l and drag force fd set to zero

by the first three estimators). They also used the projection

model of Section IV-A while the quaternion constraints (36)

were incorporated in all but the first estimator. In the case

of the UKF, the quaternion constraints were enforced using a

projection method [62]. For QCKF-A, lift force was calculated

per the procedure in [71]. The QCKF-S instead approximates

the object by a volume-equivalent sphere, and applies em-

pirical equations of drag and lift for a spherical shape. The

polyhedron experiment made use of the graph-based feature

tracking algorithm from Section VI.

In each experiment, two Ximea MQ022CG-CM high-speed,

color cameras in stereo vision configuration were used to

capture images simultaneously of an object in flight. Both

cameras were equipped with Navitar NMV-6 lenses with an

81.9 degree field of view. The cameras were carefully cali-

brated with point data obtained by images of a 3D calibration

2At the start of estimation, velocities are unknown and a best guess canbe made. At later time instants, reliable velocity estimates from a previoushypothesis can be used as initial values.

11

TABLE ISUMMARY OF EXPERIMENTS WITH THE QUATERNION CONSTRAINT (QC),

EPIPOLAR CONSTRAINT (EC), AERODYNAMICS (AERO), AND

GRAPH-BASED FEATURE TRACKING (GFT)

Experiment Features

Object Estimator QC EC Aero GFT

WoodFrame

EKF

UKF X X

QCKF X X

RugbyBall

QCKF X X

QCKF-A X X X

FoamPolyhedron

QCKF-S X X X X

QCKF-A X X X X

object3 via bundle adjustment [93] and individually initialized

camera parameters [94]. The two cameras’ parameters were

estimated and poses approximated together via optimization

under the epipolar constraint (24) for minimum reprojection

error. Images were captured at more than 200 fps depending

on the image size in the respective experiment.

Estimation started at time 0 s when an initial estimate was

determined, and ran until the object left both cameras’ fields of

view. The initial estimate of the object’s pose was computed

from a procedure given in Appendix A upon detecting four

(or more) pairs of image points from features/marks on the

object visible to both cameras. During the first few frames,

the object’s initial velocity was estimated roughly via finite

differencing over position. The initial angular velocity had its

estimate set close to zero since an estimate via differencing

would contain a large error. Initial error covariances P+0 of

the state estimate and R0 of the vector of observables were

determined from tuning of the Kalman filter over more than

100 experiment trials.4

Last to introduce is the metric of reprojection error used for

validation of pose estimates. We choose the root-mean-square

error (RMSE) formulated as√

d(C,x), where d is defined in

(49), C is the winning hypothesis, and x is its state estimate.

A. Wooden Frame with Accelerometers

In the first experiment, a wooden frame object made up

of three orthogonal links was thrown by hand across the two

cameras’ fields of view approximately 2.3 m away. The object

contained four vertices shown in Fig. 9(a), with one at the

center as the “origin” of the frame, and the remaining three

at the ends of the three links, colored red, green, and blue,

respectively, and on their axes of symmetry. Each link, treated

as an edge of the object, was easily detected by its color during

image processing.

Near each vertex, linear acceleration was measured by a

tri-axis ADXL335 accelerometer from Analog Devices, with

3composed of checkerboard patterns pasted on the faces of a cube4Covariances were 0.1 for the three components of the object’s position,

0.01 for four components of rotation and three components of velocity, and 1for three components of angular velocity, 0.005 for six components of eachcamera’s position, and 0.001 for eight components of rotation. Measurementcovariances were 25 for the two coordinates of image points in equation (25),and 0.003 for the three coordinates of the epipolar normals in equation (26).

battery

accelerometer

wirelesstransmitter

(a) (b)

Fig. 9. (a) A wooden frame object with three links and four accelerometersattached. Each link is 30 cm long and 2.9 cm by 2.9 cm in the cross section.Three vertices at the ends of each link are marked by black dots, and onevertex at the frame’s “origin” by a black circle. (b) The link representing they-axis (colored green) with one accelerometer at its end and another near theintersection of the three links. The red and blue links representing the x- andz-axes have one accelerometer each at their ends.

range ±29.4 m/s2. Fig. 9(b) shows the configuration of one

axis of the frame. The data acquired by each accelerometer

were transmitted wirelessly using a separate Digi XBee Series

1 modules.5 The object’s inertia tensor was also adjusted to

account for the accelerometers as point masses, yielding the

value Q = diag(5.57, 5.57, 5.60)× 10−3 kg ·m2.

With a total mass of 0.31 kg, the wooden frame is subjected

to negligible effects of aerodynamic forces. The frame was

estimated by the QCKF, EKF, and UKF for 0.62 s at a rate

of 202 fps, and over a distance of 2.13 m within the images

of its flight taken by the two cameras as shown in Fig. 10(a)-

(b). The poses estimated by the EKF and QCKF at a few

moments are also drawn. In images by the two cameras,

pairs of poses estimated by the EKF do not appear to align

well with the object. This is primarily due to absence of

the epipolar constraint and the unit quaternion constraints.

On the contrary, the pose estimated by the QCKF aligns

better in both images as a result of including the epipolar

normals as observables. Consequently, the EKF yielded higher

reprojection errors while the UKF (not shown) obtained very

similar errors to the QCKF due to its enforcement of the

constraints.

Finally, we compare the estimated velocities with those

measured by accelerometers. First, the object’s initial pose

is calculated using carefully extracted image points and the

procedure of Appendix A, and its initial velocity bv by differ-

encing the object’s position. The i-th accelerometer produces

readings ai that satisfy the kinematics equation ai =bv+bω×

bri+bω× (bω× bri), where bri locates the accelerometer in

the body frame. Using four accelerometers, we obtain twelve

quadratic equations in total with nine unknowns in bv, bω,

and bω. The equations can instead be written as a linear system

with twelve (dependent) unknowns: three for bv, and nine for

a 3× 3 matrix W composed from bω and bω. Formulation of

the linear system and extraction of bω from W are discussed in

[18]. At each time instant, the resulting bω updates the object’s

rotation, which is used to convert bv back to the world frame

after its update from integration of bv.

5Before the object was thrown, the accelerometers were calibrated byrepeatedly measuring voltages corresponding to the force of gravity whilelaying flat on a table.

12

EKF

(a)

EKF

(b)

(c) (d)

Fig. 10. Images of the wooden frame in flight, cropped and superposedon a single image from the (a) first and (b) second cameras. The object’spose estimated by the QCKF and EKF are drawn by their edges that werevisible to each camera (yellow and dashed white lines, respectively). Labelsbelow each object indicate the time since the start of estimation. Plots of theestimated and observed (c) linear and (d) angular velocities. Velocity readingsfrom accelerometers in the x, y, and z directions are plotted as solid lines inlight red, green, and blue, respectively. Velocities estimated by the QCKF areplotted as dashed lines in red, green, and blue, and by a UKF as thin solidlines in the same colors.

The method described above of integrating accelera-

tions produce the smooth curves plotted by solid lines

in lighter colors in Fig. 10(c)–(d). For comparisons, these

two diagrams also plot the estimated values of veloc-

ity and angular velocity by the UKF and QCKF. Ve-

locity estimates of the two filters mostly overlap, while

small gaps can be seen among wvx and wvy. At the

ending time 0.62 s, the estimation errors of the QCKF

and UKF were respectively (−0.056,−0.153,−0.091)⊤

and (−0.185,−0.284,−0.107)⊤ m/s for velocity, and

(−0.004, 0.032,−0.026)⊤ and (−0.023,−0.008,−0.009)⊤

rad/s for angular velocity. The difference in these velocity

errors, while seemingly small, yield larger errors (i.e. drift) in

pose estimates through integration. Moreover, in (d) higher

errors are seen to accumulate in bωy and bωz up to time

0.22 s. This is due to the x-axis not being detected as shown

in Fig. 10, effectively reducing the information available for

estimating rotation about the two adjacent orthogonal axes.

Once the x-axis becomes visible again, estimation errors inbωy and bωz quickly improve.

0.12 s

0.58 s

(a)

0.12 s

0.58 s

(b)

(c) (d)

Fig. 11. (a) Predictions of the rugby ball’s position trajectory starting fromestimates at time 0.12 s produce the green curves for QCKF-A, and the redcurve for QCKF. Both estimates are propagated including aerodynamics totime 0.58 s near the end of estimation. The two resulting position trajectoriesare projected onto the image plane of the first camera for comparison with theestimated positions drawn by blue dots. A sequence of rugby ball snapshotsare superposed onto the image plane. The object is spinning primarily aboutits major axis drawn by a thin white line. (b) Estimated trajectory of therugby ball during the segment of flight from 0.12 to 0.58 s. The object’sbody frame is drawn by red, green, and blue dashed arrows, along with itsestimated velocity v (black), angular velocity ω (teal), and lift force FL

(magenta). The RMSE are plotted from the (c) first and (d) second camerasby two polylines, one in blue calculated from estimates of the QCKF, andone black from estimates of the QCKF-A.

B. Rugby Ball

In the second experiment, estimation took place with aero-

dynamics on a thrown rugby ball. The hollow ball was primar-

ily composed of rubber with a mass of 0.42 kg, and took on

the shape of a prolate spheroid with semi-axis lengths 0.145,

0.095, and 0.095 m. Since spherical objects lack features

to contribute measurements during estimation, five markers

were attached to the rugby ball’s surface: a green marker at

one of the two intersections with its major axis, and four

blue markers at all intersections with its two minor axes.

The ball was thrown with a spin about its major axis (i.e.

a “spin pass”), such that the green marker remained visible

and the blue markers rotated into and out of view. The

markers were detected by their color and provided to two

estimators: QCKF-A (with aerodynamics modeled) and QCKF

(without). To measure the cross sectional area needed for

13

TABLE IIRANGES OF RATIOS OF AERODYNAMIC FORCES CALCULATED BY

QCKF-A THROUGHOUT FLIGHT TO THE FORCE OF GRAVITY.

ObjectRange of ratios to gravity

fd f l

Rugby Ball [0.034, 0.063] [0.327, 0.486]

Polyhedron [0.100, 0.361] [0.142, 0.394]

approximating drag force, 400 surface points were computed

as “vertices” to be projected along the velocity direction. The

ball’s flight spanned 0.704 s, at which the object achieved

estimated velocity v = (−2.148, 1.991,−3.547)⊤ m/s and

angular velocity ω = (18.280,−0.1576, 1.772)⊤ rad/s.How does the modeling of aerodynamics improve prediction

of the object’s trajectory? Fig. 11(a) plots the position tra-

jectories starting with estimates by the QCKF-A and QCKF

at time 0.12 s and forward propagated to time 0.58 s near

the end of flight. To evaluate these two estimates fairly, both

trajectories are produced with inclusion of aerodynamics, and

then compared to the “true” trajectory over the remainder of

the flight.6 Hence, errors from the comparisons indicate the

accuracies of the starting estimates. The position trajectory

(green) starting from the estimate of the QCKF-A is seen to

yield a much closer prediction. The position trajectory (red)

of the QCKF falls faster than that of the QCKF-A due to the

absence of an upward lift force. Since the ball was thrown

with a back-spin, the Magnus effect produced an upward lift

force that prolonged the ball’s flight. Fig. 11(b) shows the

estimated trajectory of the ellipsoid by the QCKF-A with the

estimated velocities and resulting lift forces drawn by arrows.

Moreover, the row for the rugby ball in Table II gives ratios of

the magnitude of drag and lift forces to that of gravity, where

f l in particular is seen to be a large contributing force.

When compared to estimates at the end of flight, pre-

diction errors of orientation as zyx Euler angles were

(0.62, 0.42, 0.24) rad for the QCKF, and (0.47, 0.35, 0.18) rad

for the QCKF-A. While the QCKF-A yields lower prediction

errors, in both cases the errors about the z-axis were slightly

larger due to the larger rate of rotation. Inclusion of aero-

dynamics also yielded slightly lower reprojection errors for

QCKF-A shown in Fig. 11(c)–(d). As more image frames were

acquired, integration of dynamics propagated errors to the

object’s pose, resulting in an improvement of approximately

2 px for the QCKF-A by the end of flight.

C. Irregular Polyhedron

In the final experiment, a hollow, polyhedral foam object

was thrown by hand to demonstrate motion estimation along-

side feature tracking in the absence of fiducial markers. The

polyhedron was constructed by combining eight triangular

facets cut from foam boards.7 It had a mass of 0.069 kg and

6This is the trajectory from estimating the object’s state all the way to theend of flight.

7The polyhedron has the following vertices in the frame Fb aligned withits principal axes: (0.051,−0.008, 0.121)⊤, (−0.04,−0.055, 0.101)⊤,(0.073, 0.103,−0.068)⊤, (−0.071, 0.156,−0.002)⊤, (−0.044,−0.095,−0.102)⊤, and (0.031,−0.101,−0.05)⊤.

(a)

(b)

(c)

Fig. 12. Images of the polyhedron in flight, cropped and superposed onto asingle images, as captured by the (a) first and (b) second camera. The poseestimated by QCKF-A is drawn by its edges (yellow lines) that were visibleto each camera. Labels below each object indicate the time since the startof estimation. (c) Timeline of hypotheses and their errors over the flight ofa polyhedron with the thresholds Γ = 800 and κ = 3. Each colored linerepresents a hypothesis that ends when rejected or when the object’s flightends. The hypotheses are numbered 1–7. Generation of new hypotheses occursat 0.4 in which case colors are reused to plot their errors.

inertia tensor of Q = diag(3.20, 1.96, 2.45) × 10−4 kg · m2.

Due to its large size and low weight, the flying polyhedron was

subjected to the effects of aerodynamics, to the extent that air

flew over its faces produced noticeable lift force. Overall, the

aerodynamic effect was more complex than on the rugby ball

from the previous experiment given the polyhedron’s geomet-

ric asymmetry and singularities at its edges and vertices.

Detected corner and edge features on the polyhedron were

inputs to the feature tracking algorithm in Section VI. The

polyhedron was estimated by QCKF-A for 0.52 s at a rate

of 207 fps, and over a distance of 1.73 m within the image.

Fig. 12(a)–(b) shows its images during the flight. The object’s

pose estimates at a few time instants (e.g., 0.1 s) were plotted

with only those edges visible to each camera. In several

frames, some visible edges are clearly missing after not being

detected due to poor lighting of the polyhedron’s faces. Re-

gardless, the feature tracking and estimation algorithms were

able to keep up with the flight. Magnitudes of the polyhedron’s

aerodynamic forces from QCKF-A are shown in Table II.

To demonstrate the performance, Fig. 12(c) plots a timeline

of hypothesis errors for the object’s flight. The algorithm

14

QCKF-S

QCKF-A

RM

SE

(px)

s s

(a) (b)

Fig. 13. Reprojection errors of the QCKF-A and QCKF-S are plottedrespectively in blue and black for the (a) first and (b) second camera.

initially generates two hypotheses labeled 1© and 2©. Both

maintain relatively low error as their estimates improve, until

time 0.25 s when errors spike due to noise in the image. A few

frames later, the hypotheses are able to recover with updated

correspondences, yielding errors that reflect the object’s new

pose. In particular, hypothesis 2©, which is now presumably an

incorrect hypothesis due to its higher error, persists for a short

while just below the threshold for rejection. At approximately

0.35 s, errors again increase due to noise that occurs as a face

of the object disappears. At this moment, seen in Fig. 12(a)

and (b), image processing fails in subsequent iterations to

distinguish between the vertices labeled v1 and v2. A few

iterations of the tracking algorithm have passed before the two

hypotheses are rejected and hypothesis generation is triggered,

yielding five new hypotheses labeled 3©– 7© at 0.4 s. Of these,

hypothesis 3© is tracked with low error as it appears to

correctly identify the objects pose, while 6© and 7© are later

rejected, and 4© and 5© remain within the error threshold.The

object’s state estimate at each frame is chosen from the

hypothesis with the lowest error.

It is inevitable that image noise and imperfect processing

will prevent a single hypothesis from spanning the whole

flight. Regardless, the sequence of poses of the object is

successfully tracked by the hypotheses 1© and 3© (both

conveniently colored blue). Due to the smooth transitioning

of the state estimate for the new set of hypotheses, minimal

progress in estimation is lost during the time when hypothesis

generation takes place.

D. Comparison with Closed-form Lift Force using Spherical

Approximation

The polyhedron was simultaneously estimated using closed-

form equations of drag and lift for a sphere centered at o and

with equal volume (i.e., 2.66 × 10−3 m3). More specifically,

the drag equation in (10) is now calculated with a constant

cross-sectional area A and coefficient Cd ≈ 0.4 [72, p. 261].

The lift force, no longer calculated via solving the Laplace

equation (6), has a form similar to (10):

f l =1

2ρACl(

bv · bv)bω × bv

‖bω × bv‖, (50)

where Cl ≈ 0.22 [95]. Fig. 13(a) and (b) show reprojection

errors from QCKF-A and QCKF-S for the first and second

cameras, respectively. The errors are similar until 0.3 s when

new vertices appearing in images cause measure inaccura-

cies of rotation (resulting in a jump of reprojection error

for QCKF-S). Even after correction, the QCKF-S yielded a

poorer estimate of angular velocity due to not accounting for

lift force from air translating over the non-spherical surface.

Consequently, the estimated orientation lags behind that of the

QCKF-A, resulting in an increase of reprojection error of 6.2and 4.6 px at the end of estimation for the respective cameras.

E. Experiment Summary

The experiments described in this section were conducted

on a Windows PC equipped with a 4-core 3.5 GHz Intel Xeon

CPU and 8 GB of memory. Table III presents computation

times for the five estimators to update and correct the state

estimate, averaged over all experiments for which they were

used. During the update step, two rounds of the fourth order

Runge-Kutta method [96], [97] of integration were performed,

requiring eight evaluations of the dynamics equation, and in

the case of QCKF-A, each time recalculating the aerodynamic

forces. For both the rugby ball and polyhedron, non-uniform

grids consisting of roughly 3, 500 grid points were used,

yielding a linear system of approximately 3, 500 rows and

5, 000 columns to be solved. It is for this reason that the

QCKF-A is most time-consuming.

Meanwhile, the UKF as expected required more compu-

tation time than the EKF due to updating and correcting

multiple sigma points. Although the UKF is known to better

approximate some nonlinear models, it had no advantage in

the estimation of the wooden frame. Moreover, were the UKF

to consider aerodynamics, repeatedly calculating lift and drag

forces for each point would prove the estimator infeasible for

real-time motion estimation.

Estimation errors would be easily computed if data sets of

real images and corresponding motion data for a free-flying

object were available. To the best of our knowledge, no such

data sets have been made available for research use. We rely on

the measurement of pose error in both images (i.e. reprojection

error) to represent the pose error in the world.

Below we argue for the use of reprojection error as a metric

for 3D pose accuracy. We focus on that no two poses of the

object could generate the same pair of images on the two

cameras, unless some degeneracy of projection takes place or

the object assumes certain symmetry. The image coordinates

of three feature points generated by a single camera might

yield six independent equations, each composing the chain

in (23), from which the pose can be solved. However, the

lack of depth information and imaging noise could result in

pose ambiguities. With another image taken simultaneously,

six independent constraints are induced and satisfiable by only

a finite number of poses. The remaining constraints, including

the epipolar constraint discussed in Section IV, further reduce

the possible poses to almost always a unique one. This

reasoning is strengthened by considering not a single pose but

a sequence of poses estimated from pairs of simultaneously

taken image frames of the flying object. The probability of

two different initial states of the object producing the same

sequence of poses is negligible, if not zero. Generally, a unique

correspondence exists between the initial (and subsequently

the current) state and a sequence of images.

15

TABLE IIICOMPUTATION TIME (ms) OF ESTIMATORS (MEAN ± STD.)

Estimator Update Step Correct Step

EKF 0.37± 0.11 1.11± 0.37

UKF 16.8± 5.65 40.9± 9.97

QCKF 0.89± 0.37

QCKF-S 1.21± 0.35 8.36± 2.76

QCKF-A 143.21± 19.8

VIII. DISCUSSION AND FUTURE WORK

This work presents a scheme for motion estimation under

conditions that have previously rendered acquisition of ac-

curate pose and velocity estimates infeasible. During a fast

free flight, the object is susceptible to large aerodynamic

forces (in the case of low mass density), and observed for

only a fraction of a second. Experiments have demonstrated

reduced estimation errors with improvements over the baseline

EKF approach. Thorough nonlinear models of the object’s

dynamics and vision-based observables are presented for use

in robust Kalman filtering. Lift force is calculated using the

Laplace equation, and drag force from closed-form approxima-

tion, yielding improved trajectory estimates for light objects.

Estimation error is further reduced by enforcing multiple

nonlinear constraints on the state at minimal computational

cost. Finally, planar graph matching effectively tracks features

of a polyhedron with no fiducial markers, while dynamic

updating of hypotheses robustly deals with large image noise.

These contributions, though combined into one motion esti-

mation scheme, can also be employed individually to improve

accuracy, enforce multiple constraints, or deal with noise.

Our approach to motion estimation was developed in part

with robotic manipulation in mind. Unlike in VINS and

SLAM, a small margin of error exists for estimates to yield

predicted trajectories that can lead to successful executions of

operations such as batting, catching, etc. The manipulated ob-

jects are usually light and susceptible to air effects such as lift

and drag, as observed in our experiments. Without modeling

of aerodynamic forces, large errors accumulated in the object

trajectories will significantly affect operation outcomes. That

these objects (such as balls in sports) often assume known

geometries and material properties also provides convenience

for such modeling.

While accuracy is a main focus of our approach, com-

putational cost can hinder real time estimation. Table III

presented computation times in which the QCKF operated

at 110 Hz, and the QCKF-A with repeated calculations of

aerodynamic forces operated at just below 10 Hz. An order of

magnitude improvement could be achieved by a combination

of modern computing hardware, tuning of parameters, and

utilization of GPU processing (e.g., in calculation of lift force).

Furthermore, optimizations of the QCKF can be made to

reduce the number of matrix calculations as in [46], including

reduction of quaternion noise to a 3-vector and exclusion of

velocities from equations during correction for observables.

A few potential directions are in line for future work. First,

generalized to irregular polyhedral objects, the approach can

be further extended to curved objects. These include objects

with a single curved surface or with a surface consisting

of multiple curved patches. Features such as high curvature

points, discontinuities between surface patches, textures, or

fiducial markers can provide the observables required for

estimation. Second, the work can be utilized for its originally

intended purpose of targeted batting in 3D to extend the

authors’ work in [10]. Third, robotic catching would also

benefit from the accuracy of model-based estimation. Existing

methods fail to determine a catching configuration for objects

thrown along trajectories that deviate from those used to build

machine learned models [8]. The methods presented here can

provide accurate estimates of any visible trajectory, yielding

low prediction error in a variety of scenarios, and a higher

overall success rate.

Moreover, the use of stereo vision allows the motion esti-

mation approach to be versatile. Two cameras can be mounted

on a humanoid robot to perform motion estimation of a flying

ball in sports for example. As long as features on the ball can

be detected in its images, its trajectory can be perceived, even

in the case that it is moving directly towards the robot.8 The

cameras may be mounted on other mobile robots or vehicles

for estimation where additional parameters may need to be

estimated for localization,9 or fixed to the world to perceive

motion in a scene. The estimation scheme is applicable to tasks

for space robots, where aerodynamic forces can be ignored due

to scarcity or absence of air.

APPENDIX A

INITIALIZATION OF 6-DOF POSE

Once a new hypothesis is created, we estimate the object’s

pose, composed of position o(k) and rotation r(k) at the k-th

time instant. The estimates will be used by a Kalman filter

created under the hypothesis.

Stereo cameras viewing a point p induce the epipolar

geometry described in Section IV-A, which allows for a pair

of image points ip and i′p, respectively generated by the

first and second cameras, to be back-projected to some (but

not necessarily the same) point in the world. The chain of

equations in (23) are solved in reverse order to obtain the world

coordinates wp of their common source point. In particular, the

second step of the solution involves undistortion of pp and p′

p,

yielding pp and p′

p. To recover the point’s third coordinate

from (18), we perform “rectification” of the points pp and p′

p

to transform their image planes to be coplanar [98, p. 159]. To

that effect, the transformed points lie on a line parallel to the

line through the camera centers, allowing for “triangulation”

to determine their z-coordinate [99].

With a set of back-projected points and the set of object’s

vertex coordinates in Fb, the correspondence between them is

known under the new hypothesis Hnew.10 We invoke Horn’s

method [100] to compute the translation o(k) and rotation r(k)

of the frame Fb from Fw.

8with a possible loss in accuracy due to errors in measurement of depth9refer to SLAM and VINS works for solutions to the localization problem10The image points ipj and i′pj are associated with vertices that, via

a hypothesis, correspond with a vertex located at bpk . Hence, the worldcoordinate wpj computed from the image points shares this correspondence.

16

REFERENCES

[1] N. Takeishi, T. Yairi, Y. Tsuda, F. Terui, N. Ogawa, and Y. Mimasu,“Simultaneous estimation of shape and motion of an asteroid forautomatic navigation,” in Proc. IEEE Int. Conf. Robot. Autom., pp. 2861–2866, 2015.

[2] T. Fujiwara, T. Nakamura, Y. Tsuda, and S. Nakasuka, “Motion estima-tion and capturing of tumbling object in non-gravitational field,” IFAC

Proc. Vol., vol. 31, no. 21, pp. 77–82, 1998.

[3] F. Terui, H. Kamimura, and S. Nishida, “Motion estimation to a failedsatellite on orbit using stereo vision and 3D model matching,” in Int.

Conf. Control, Autom., Robot. and Vision, pp. 1–8, IEEE, 2006.

[4] H. Nagamatsu, T. Kubota, and I. Nakatani, “Capture strategy for retrievalof a tumbling satellite by a space robotic manipulator,” in Proc. IEEE

Int. Conf. Robot. Autom., vol. 1, pp. 70–75, IEEE, 1996.

[5] Guoliang Zhang, Hong Liu, Jie Wang, and Zainan Jiang, “Vision-basedsystem for satellite on-orbit self-servicing,” in IEEE/ASME Int. Conf.

Adv. Intell. Mech., pp. 296–301, 2008.

[6] S. I. Roumeliotis, A. E. Johnson, and J. F. Montgomery, “Augmentinginertial navigation with image-based motion estimation,” in Proc. IEEE

Int. Conf. Robot. Autom., vol. 4, pp. 4326–4333, IEEE, 2002.

[7] C.-C. Ho and N. H. McClamroch, “Autonomous spacecraft dockingusing a computer vision system,” in [1992] Proc. IEEE Conf. Decis.

and Control, pp. 645–650, IEEE, 1992.

[8] S. Kim, A. Shukla, and A. Billard, “Catching objects in flight,” IEEE

Trans. Robot., vol. 30, no. 5, pp. 1049–1065, 2014.

[9] T. Senoo, A. Namiki, and M. Ishikawa, “High-speed batting using amulti-jointed manipulator,” in Proc. IEEE Int. Conf. Robot. Autom.,pp. 1191–1196, 2004.

[10] Y.-B. Jia, M. Gardner, and X. Mu, “Batting an in-flight object to thetarget,” Int. J. Robot. Res., vol. 38, no. 4, pp. 451–485, 2019.

[11] H. Li, H. Wu, L. Lou, K. Kuhnlenz, and O. Ravn, “Ping-pong roboticswith high-speed vision system,” in Proc. Int. Conf. Control, Autom.,

Robot. and Vis., pp. 106–111, 2012.

[12] J. Kim and M. Kim, “Smart vision system for soccer training,” in Int.

Conf. Inf. Commun. Technol. Converg., pp. 257–262, IEEE, 2015.

[13] R. Herrejon, S. Kagami, and K. Hashimoto, “Online 3-D trajectoryestimation of a flying object from a monocular image sequence,” in Proc.

IEEE/RSJ Int. Conf. Intell. Robots and Syst., pp. 2496–2501, 2009.

[14] L. Acosta, J. J. Rodrigo, J. A. Mendez, G. N. Marichal, and M. Sigut,“Ping-pong player prototype,” IEEE Robot. and Autom. Mag., vol. 10,no. 4, pp. 44–52, 2003.

[15] Y. Nakabo, I. Ishi, and M. Ishikawa, “3D tracking using two high-speedvision systems,” in Proc. the IEEE/RSJ Int. Conf. Intell. Robots and

Syst., vol. 1, pp. 360–365, 2002.

[16] A. Padgaokar, “Measurement of angular acceleration of a rigid bodyusing linear accelerometers,” Trans. Am. Soc. Mech. Eng., vol. 75,pp. 522–526, 1975.

[17] P. Cardou and J. Angeles, “Angular velocity estimation from the angularacceleration matrix,” ASME J. Applied Mech., vol. 75, no. 2, 2008.

[18] P. Cardou, G. Fournier, and P. Gagnon, “A nonlinear programfor angular-velocity estimation from centripetal-acceleration measure-ments,” IEEE/ASME Trans. Mech., vol. 16, no. 5, pp. 932–944, 2010.

[19] J. L. Marins, X. Yun, E. R. Bachmann, R. B. McGhee, and M. J. Zyda,“An extended Kalman filter for quaternion-based orientation estimationusing marg sensors,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots and

Syst., vol. 4, pp. 2003–2011, 2001.

[20] N. P. Papanikolopoulos, P. K. Khosla, and T. Kanade, “Visual trackingof a moving target by a camera mounted on a robot: A combination ofcontrol and vision,” IEEE Trans. Robot. Autom., vol. 9, no. 1, pp. 14–35,1993.

[21] J. S. Goddard and M. A. Abidi, “Pose and motion estimation using dualquaternion-based extended Kalman filtering,” in Proc. SPIE Conf. 3-D

Image Capt. Appl., vol. 3313, pp. 189–200, 1998.

[22] S. Kim and A. Billard, “Estimating the non-linear dynamics of free-flying objects,” IEEE Robot. Auton. Syst., vol. 60, no. 9, pp. 1108–1122,2012.

[23] R. Zanetti, M. Majji, R. H. Bishop, and D. Mortari, “Norm-constrainedKalman filtering,” J. Guid., Control, and Dyn., vol. 32, no. 5, pp. 1458–1465, 2009.

[24] D. Wang, M. Li, X. Huang, and J. Li, “Kalman filtering for a quadraticform state equality constraint,” J. Guid., Control, and Dyn., vol. 37,no. 3, pp. 951–958, 2014.

[25] P. Gurfil and H. Rotstein, “Partial aircraft state estimation from visualmotion using the subspace constraints approach,” J. Guidance, Control,

and Dyn., vol. 24, no. 5, pp. 1016–1028, 2001.

[26] A. Nakashima, T. Okamoto, and Y. Hayakawa, “An online estimation ofrotational velocity of flying ball via aerodynamics,” IFAC Proc. Vols.,vol. 47, no. 3, pp. 7176–7181, 2014.

[27] Y. Zhang, Y. Zhao, R. Xiong, Y. Wang, J. Wang, and J. Chu, “Spinobservation and trajectory prediction of a ping-pong ball,” in Proc. the

IEEE Int. Conf. Robot. Autom., pp. 4108–4114, 2014.

[28] J. D. Anderson Jr, Introduction to Flight. McGraw-Hill, fourth ed., 2000.

[29] D. B. Gennery, “Visual tracking of known three-dimensional objects,”Int. J. Comput. Vis., vol. 7, no. 3, pp. 243–270, 1992.

[30] T. Drummond and R. Cipolla, “Real-time visual tracking of complexstructures,” IEEE Trans. Pattern Anal. Machine Intell., vol. 24, no. 7,pp. 932–946, 2002.

[31] E. Marchand, P. Bouthemy, F. Chaumette, and V. Moreau, “Robust real-time visual tracking using a 2d-3D model-based approach,” in Proc. the

IEEE Int. Conf. Comput. Vis., vol. 1, pp. 262–268, 1999.

[32] D. G. Lowe, “Robust model-based motion tracking through the inte-gration of search and estimation,” Int. J. Comput. Vis., vol. 8, no. 2,pp. 113–122, 1992.

[33] E. Rosten and T. Drummond, “Fusing points and lines for high per-formance tracking,” in Proc. the IEEE Int. Conf. Comput. Vis., vol. 2,pp. 1508–1515, 2005.

[34] V. A. Prisacariu and I. D. Reid, “Pwp3D: Real-time segmentation andtracking of 3D objects,” Int. J. Comput. Vis., vol. 98, no. 3, pp. 335–354,2012.

[35] J. M. Kelsey, J. Byrne, M. Cosgrove, S. Seereeram, and R. K. Mehra,“Vision-based relative pose estimation for autonomous rendezvous anddocking,” in Proc. IEEE Aerosp. Conf., pp. 20–pp, 2006.

[36] E. K. Wong, “Model matching in robot vision by subgraph isomor-phism,” Pattern Recognit., vol. 25, no. 3, pp. 287–303, 1992.

[37] K. Riesen and H. Bunke, “Approximate graph edit distance computationby means of bipartite graph matching,” Image and Vis. Comput., vol. 27,no. 7, pp. 950–959, 2009.

[38] S. Baloch and H. Krim, “Object recognition through topo-geometricshape models using error-tolerant subgraph isomorphisms,” IEEE Trans.

Image Process., vol. 19, no. 5, pp. 1191–1200, 2009.

[39] M. Garon and J.-F. Lalonde, “Deep 6-DOF tracking,” IEEE Trans. Vis.

Comput. Graphics, vol. 23, no. 11, pp. 2410–2418, 2017.

[40] B. Wen, C. Mitash, B. Ren, and K. E. Bekris, “se(3)-tracknet: Data-driven 6D pose tracking by calibrating image residuals in syntheticdomains,” 2020.

[41] R. E. Kalman, “A new approach to linear filtering and predictionproblems,” Trans. ASME — J. Basic Eng., vol. 82 (Series D), pp. 35–45,1960.

[42] S. Soatto and P. Perona, “Recursive 3-D visual motion estimation usingsubspace constraints,” Int. J. Comput. Vis., vol. 22, no. 3, pp. 235–259,1997.

[43] T. P. Webb, R. J. Prazenica, A. J. Kurdila, and R. Lind, “Vision-basedstate estimation for autonomous micro air vehicles,” J. Guid., Control,

and Dyn., vol. 30, no. 3, pp. 816–826, 2007.

[44] D. D. Diel, P. DeBitetto, and S. Teller, “Epipolar constraints for vision-aided inertial navigation,” in IEEE Workshops Appl. Comput. Vis., vol. 2,pp. 221–228, 2005.

[45] J.-O. Nilsson, D. Zachariah, M. Jansson, and P. Handel, “Realtimeimplementation of visual-aided inertial navigation using epipolar con-straints,” in Proc. IEEE/ION Pos., Loc. and Nav. Symp., pp. 711–718,2012.

[46] A. I. Mourikis and S. I. Roumeliotis, “A multi-state constraint Kalmanfilter for vision-aided inertial navigation,” in Proc. IEEE Int. Conf.

Robot. Autom., pp. 3565–3572, 2007.

[47] K. Eckenhoff, Y. Yang, P. Geneva, and G. Huang, “Tightly-coupledvisual-inertial localization and 3-D rigid-body target tracking,” IEEE

Robot. Autom. Letters, vol. 4, no. 2, pp. 1541–1548, 2019.

[48] E. J. Lefferts, F. L. Markley, and M. D. Shuster, “Kalman filtering forspacecraft attitude estimation,” J. Guid., Control, and Dyn., vol. 5, no. 5,pp. 417–429, 1982.

[49] F. L. Markley, “Attitude error representations for Kalman filtering,” J.

Guid., Control, and Dyn., vol. 26, no. 2, pp. 311–317, 2003.

[50] K. Brown, D. Anderson, C. Watts, and B. Connor, “Fusion of diverseperformance inertial sensors for improved attitude estimation within astabilisation platform for electro-optic systems,” in Image and Signal

Proc. Remote Sens. XXV, vol. 11155, p. 1115516, International Societyfor Optics and Photonics, 2019.

[51] J. L. Crassidis, “Sigma-point Kalman filtering for integrated gps andinertial navigation,” in Proc. AIAA Guid., Navigation. Control Conf.,SanFrancisco, CA, USA, 2005, p. 6062.

17

[52] F. Li and L. Chang, “MEKF with navigation frame attitude errorparameterization for ins/gps,” IEEE Sens. J., vol. 20, no. 3, pp. 1536–1549, 2019.

[53] S. Bonnabel, “Left-invariant extended kalman filter and attitude estima-tion,” in Proc. IEEE Conf. Decis. and Control, pp. 1027–1032, 2007.

[54] S. Bonnabel, P. Martin, and E. Salaun, “Invariant extended Kalman filter:theory and application to a velocity-aided attitude estimation problem,”in Proc. IEEE Conf. Decis. and Control, pp. 1297–1304, 2009.

[55] A. Barrau and S. Bonnabel, “The invariant extended Kalman filteras a stable observer,” IEEE Trans. Automatic Control, vol. 62, no. 4,pp. 1797–1812, 2016.

[56] N. Y. Ko, W. Youn, I. H. Choi, G. Song, and T. S. Kim, “Featuresof invariant extended Kalman filter applied to unmanned aerial vehiclenavigation,” Sensors, vol. 18, no. 9, p. 2855, 2018.

[57] M. Brossard, S. Bonnabel, and A. Barrau, “Invariant kalman filtering forvisual inertial slam,” in Proc. Int. Conf. Info. Fusion, pp. 2021–2028,IEEE, 2018.

[58] A. Barrau and S. Bonnabel, “Extended Kalman filtering with nonlinearequality constraints: a geometric approach,” IEEE Trans. Automatic

Control, vol. 65, no. 6, pp. 2325–2338, 2019.

[59] C. Yang and E. Blasch, “Kalman filtering with nonlinear state con-straints,” IEEE Trans. Aerosp. Electron. Syst., vol. 45, no. 1, pp. 70–84,2009.

[60] D. Simon and T. L. Chia, “Kalman filtering with state equality con-straints,” IEEE Trans. Aerosp. Electron. Syst., vol. 38, no. 1, pp. 128135,2002.

[61] N. Gupta and R. Hauser, “Kalman filtering with equality and inequalitystate constraints,” Tech. Rep. NA-07/18, Oxford University ComputingLaboratory, Aug 2007.

[62] B. O. S. Teixeira, J. Chandrasekar, H. J. Palanthandalam-Madapusi,L. A. B. Torres, L. A. Aguirre, and D. S. Bernstein, “Gain-constrainedKalman filtering for linear and nonlinear systems,” IEEE Trans. Signal

Process., vol. 56, no. 9, pp. 4113–4123, 2008.

[63] J. De Geeter, H. Van Brussel, J. De Schutter, and M. Decreton, “Asmoothly constrained Kalman filter,” IEEE Trans. Pattern Anal. Mach.

Intell., vol. 19, no. 10, pp. 1171–1177, 1997.

[64] S. J. Julier and J. J. LaViola, “On Kalman filtering with nonlinearequality constraints,” IEEE Trans. Signal Process., vol. 55, no. 6,pp. 2774–2784, 2007.

[65] A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, “Monoslam:Real-time single camera slam,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 29, no. 6, pp. 1052–1067, 2007.

[66] D. Strelow and S. Singh, “Motion estimation from image and inertialmeasurements,” Int. J. Robot. Res., vol. 23, no. 12, pp. 1157–1195, 2004.

[67] J. W. Langelaan, “State estimation for autonomous flight in clutteredenvironments,” J. Guid., Control, and Dyn., vol. 30, no. 5, pp. 1414–1426, 2007.

[68] S. Shen, Y. Mulgaonkar, N. Michael, and V. Kumar, “Vision-based stateestimation for autonomous rotorcraft mavs in complex environments,”in Proc. the IEEE Int. Conf. Robot. Autom., pp. 1758–1764, 2013.

[69] R. Mebarki, J. Cacace, and V. Lippiello, “Velocity estimation of an uavusing visual and imu data in a gps-denied environment,” in Proc. the

IEEE Int. Symp. Saf., Secur., and Rescue Robot., pp. 1–6, 2013.

[70] M. Dehghani, H. Kharrati, H. Seyedarabi, and M. Baradarannia, “Im-provement of angular velocity and position estimation in gyro-freeinertial navigation based on vision aid equipment,” IET Comput. Vis.,vol. 12, no. 3, pp. 261–275, 2017.

[71] M. Gardner and Y.-B. Jia, “Supplemental Material for ‘Poseand Motion Estimation of Free-Flying Objects: Aerodynamics,Constrained Filtering, and Graph-Based Feature Tracking’.”https://faculty.sites.iastate.edu/jia/files/

inline-files/laplace-equation.pdf, 2021.

[72] L. J. Clancy, Aerodynamics. Pitman Publishing Ltd., 1975.

[73] G. K. Batchelor, “An introduction to fluid dynamics. 1967,” Cambridge,:

UP xviii, vol. 615, 1967.

[74] G. Parker, “Projectile motion with air resistance quadratic in the speed,”Am. J. Phys., vol. 45, no. 7, pp. 606–610, 1977.

[75] C. Brueningsen, J. Marinelli, P. Pappano, and K. Wallace, “Modelingair drag,” The Physics Teacher, vol. 32, no. 7, pp. 439–441, 1994.

[76] Y.-B. Jia, “Quaternion and rotation.” http://web.cs.iastate

.edu/˜cs577/quaternion.pdf, 2019.

[77] D. A. Forsyth and J. Ponce, Computer vision: a modern approach.Prentice Hall Professional Technical Reference, 2002.

[78] C. B. Duane, “Close-range camera calibration,” Photogramm. Eng.,vol. 37, no. 8, pp. 855–866, 1971.

[79] K. Levenberg, “A method for the solution of certain non-linear problemsin least squares,” Quarterly of App. Math., vol. 2, no. 2, pp. 164–168,1944.

[80] D. W. Marquardt, “An algorithm for least-squares estimation of nonlinearparameters,” J. the Soc. Ind. and App. Math., vol. 11, no. 2, pp. 431–441,1963.

[81] H. C. Longuet-Higgins, “A computer algorithm for reconstructing ascene from two projections,” Nature, vol. 293, no. 5828, pp. 133–135,1981.

[82] T. S. Huang and O. D. Faugeras, “Some properties of the E matrix intwo-view motion estimation,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 11, no. 12, pp. 1310–1312, 1989.

[83] D. Simon, Optimal State Estimation. John Wiley & Sons, Inc., 2006.[84] C. Garcia and W. I. Zangwill, “Finding all solutions to polynomial

systems and other systems of equations,” Math. Program., vol. 16, no. 1,pp. 159–176, 1979.

[85] M. De Berg, M. Van Kreveld, M. Overmars, and O. Schwarzkopf,Comput. Geom.. Springer, third ed., 2008.

[86] J. Canny, “A computational approach to edge detection,” IEEE Trans.

Pattern Anal. Mach. Intell., no. 6, pp. 679–698, 1986.[87] J. Shi et al., “Good features to track,” in Proceedings of the IEEE Int.

Conf. Comput. Vis., pp. 593–600, 1994.[88] D. Eppstein, “Subgraph isomorphism in planar graphs and related

problems,” in Graph Alg. Appl. I, pp. 283–309, World Scientific, 2002.[89] L. G. Shapiro and R. M. Haralick, “Structural descriptions and inexact

matching,” IEEE Trans. Pattern Anal. Mach. Intel., no. 5, pp. 504–519,1981.

[90] J. J. McGregor, “Backtrack search algorithms and the maximal commonsubgraph problem,” Software: Practice and Experience, vol. 12, no. 1,pp. 23–34, 1982.

[91] M. Neuhaus and H. Bunke, “An error-tolerant approximate matchingalgorithm for attributed planar graphs and its application to fingerprintclassification,” in Joint IAPR Int. Workshops on Stat. Techn. Pattern

Recognit. and Struct. and Synt. Pattern Recognit. (SSPR), pp. 180–189,Springer, 2004.

[92] S. J. Julier and J. K. Uhlmann, “New extension of the Kalman filter tononlinear systems,” in Sig. Proces., Sens. Fusion, and Target Recognit.

VI, vol. 3068, pp. 182–193, Int. Society for Optics and Photonics, 1997.[93] B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon,

“Bundle adjustmenta modern synthesis,” in Int. Workshop on Vis. Alg.,pp. 298–372, Springer, 1999.

[94] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans.

Pattern Anal. Mach. Intell., vol. 22, no. 11, pp. 1330–1334, 2000.[95] A. M. Nathan, “The effect of spin on the flight of a baseball,” Am. J.

Phys., vol. 76, no. 2, pp. 119–124, 2008.[96] C. Runge, “Uber die numerische auflosung von differentialgleichungen,”

Mathematische Annalen, vol. 46, no. 2, pp. 167–178, 1895.[97] W. Kutta, “Beitrag zur naherungsweisen integration totaler differential-

gleichungen,” Z. Math. Phys., vol. 46, pp. 435–453, 1901.[98] E. Trucco and A. Verri, Introductory techniques for 3-D computer vision,

vol. 201. Prentice Hall Englewood Cliffs, 1998.[99] R. Hartley and A. Zisserman, Multiple view zgeometry in computer

vision. Cambridge University Press, 2003.[100] B. K. P. Horn, “Closed-form solution of absolute orientation using unit

quaternions,” J. Optical Soc. of Am. A, vol. 4, no. 4, pp. 629–642, 1987.

pose and motion estimation of free-flying objects

Documents