pose and motion estimation of free-flying objects
TRANSCRIPT
1
Pose and Motion Estimation of Free-Flying Objects:
Aerodynamics, Constrained Filtering, and
Graph-Based Feature TrackingMatthew Gardner, Member, IEEE and Yan-Bin Jia, Senior Member, IEEE
Abstract—We investigate the problem of dynamically estimat-ing the instantaneous position, orientation, velocity, and angularvelocity of an arbitrarily shaped object during its free flightbased on image frames taken simultaneously by two high-speedcameras. Aerodynamic effects including drag, lift, and Magnusforces are modeled to describe the object’s flight. Observablesare derived from combining dynamics with a camera projectionmodel assuming two-view geometry, via the use of multiplequaternions. The state of the composed system can then beestimated via constrained Kalman filtering, to which a solutionis presented for the case of multiple quadratic constraints. Tokeep track of appearing and disappearing visual features duringflight, the estimation algorithm employs a graph matching-basedtechnique to maintain a set of evolving hypotheses throughevaluation, pruning, and addition. Experiments conducted overvarious objects have either provided validation against mo-tions independently estimated using multiple accelerometers, orformed verification by matching flight images against projectionsbased on the state estimates.
Index Terms—Estimation, aerodynamics, Kalman filtering,constrained optimization, image graph, feature tracking.
I. INTRODUCTION
Motion estimation of free-flying objects is a challenging
problem in multiple areas of robotics. Pose and motion esti-
mates are necessary for sensing obstacles, planning actions,
and navigating autonomously in dynamically changing en-
vironments. In space, tumbling objects such as debris and
asteroids are tracked using vision, with motions estimated via
Kalman filtering [1] and optimization [2]. Flight trajectory
estimation can help space robots better collect space debris [3],
[4], service orbiting satellites [5], and make maneuvers during
space exploration [6], [7]. For an object flying in the air, the
ability to obtain accurate estimates of its pose and velocity
over a brief duration allows for a robot to perform skillful
maneuvers such as catching [8] and batting [9], [10], [11].
In sports, such estimation is also used to track a ball for the
purpose of human training [12]. In manufacturing, a potential
application of motion estimation could be enabling robotic
arms to pass objects down an assembly line through catching
and throwing motions. Other applications include projectile
tracking, robotic underwater exploration, and virtual reality.
Support for this research was provided in part by the US National ScienceFoundation under Grant IIS-1421034. Any opinions, findings, and conclusionsor recommendations expressed in this material are those of the authors anddo not necessarily reflect the views of the National Science Foundation.
The authors are affiliated with the Department of ComputerScience, Iowa State University, Ames, IA 50010 USA (e-mails:mattga,[email protected]).
Measurement of position and linear velocity has been used
for a range of tasks. The position trajectory of a flying object
was estimated from a sequence of monocular images for object
tracking [13], by stereo cameras for a high-speed ping-pong
robot [11], [14], and by two high-speed cameras on pan-tilt
mechanisms for batting a ball [15]. As the complexity of
robot applications has grown, estimation of angular velocity
has become increasingly necessary. Measurements by linear
accelerometers placed inside a body along some orthogonal
axes were used to calculate the body’s angular velocity via
solution of a system of kinematic equations [16] or through
optimization [17], [18]. A magnetometer, accelerometer, and
angular rate sensor were employed to obtain quaternion-based
measurements, which were then supplied to a Kalman filter
for polishing [19]. These techniques, however, were invasive
as they relied on sensors mounted inside the object’s body.
In addition, none of them addressed how to estimate the
object’s linear velocity. Meanwhile, vision-based estimation
of both linear and angular velocities was investigated for
object tracking [20], [21], but without the temporal and spatial
constraints of the object being in free flight.
Vision sensors such as RGB cameras are widely accessible
and easy to work with, making them ideal for the estimation
problem. They can be made small enough to serve as eyes for
a humanoid robot. Can fast and accurate motion estimation of
a flying object be achieved using cameras alone? Here we put
an emphasis on the object in a free flight with no actuation.
Several challenges immediately arise:
• images of the object lack depth information,
• the object moves a large distance over a small duration,
• images are susceptible to noise due to the fast movement.
Some vision systems have addressed these challenges. Among
the best known are motion capture systems composed of mul-
tiple high-speed cameras surrounding an object with attached
markers [22], or actuated cameras that control their pitch and
yaw for tracking [15]. Such systems are complex and often
infeasible due to limitations of cost, space, power consump-
tion, etc. Moreover, they depart from a basic preference for
estimation without attaching markers or sensors to alter the
object’s physical appearance.
A. Contributions and Outline of the Paper
In this work, we present a scheme for accurately estimating
the poses and motions of free-flying objects while remaining
robust to the effects of short flight periods, large aerodynamic
2
Constrained
Kalman FilteringGraph-based
Feature Tracking
Aerodynamics
ObservablesHypotheses
Image
Frames
Pose and Motion
Estimates
Fig. 1. Block diagram of motion estimation.
forces experienced in the case of low mass density, and image
noise due to fast movements. We will use a system which
consists of two cameras with wide angle lenses and operating
at 120 fps or higher, and is inexpensive relative to larger
motion capture systems such as OptiTrack and Vicon. Besides
achieving estimation, we make several other contributions:
a) We introduce a graph-based feature tracking algorithm
using hypotheses to deal with uncertainties and switching
of observable features.
b) We construct for the first time a solution, via equation
derivation and numerical root finding, to Kalman filtering
under multiple quadratic constraints in the general form.
c) We model aerodynamics for free flying objects in general
shapes and integrate it with vision-based estimation.
Fig. 1 illustrates the interactions among three components
of the scheme: aerodynamics, constrained Kalman filtering,
and graph-based feature tracking. Extensive experiments are
conducted with a relatively low-cost vision system, which past
efforts have not considered.
The paper is organized as follows. Section II overviews
work related to estimation that comes from multiple areas.
Section III discusses the system dynamics of a free-flying
object and computation of the aerodynamic forces affecting
its motion. Section IV introduces the camera projection model
and geometric constraints enforced by stereo cameras to pro-
duce observables from the object. Section V gives the Kalman
filter algorithm that enforces multiple quadratic constraints
on the estimated state while making use of the models from
Sections III and IV. Section VI describes a graph-based feature
tracking algorithm for identifying image-to-model correspon-
dences that are required to drive multiple Kalman filters in a
competition for the best estimate. Section VII presents results
from three experiments of increasing complexity, conducted
with a wooden frame, rugby ball, and foam polyhedron.
Section VII concludes the paper with discussions on improve-
ments and future directions for dynamic estimation.
B. Notation
A vector (or a point) is by default a column vector, written
by a lowercase bold letter, e.g., v. A unit vector has a hat, e.g.,
v = v/‖v‖. The left superscript of a point (or a vector), if
exists, denotes the frame in which it is expressed. For example,wp gives the coordinates of a point p in the world frame
(denoted by w), while bp gives its coordinates in the body
frame (denoted by b). A scalar (or vertex in Section VI) is
written as a lowercase non-bold letter, and a matrix (or graph)
as an uppercase non-bold letter. A vector function is written
in bold, while a scalar function is non-bold. The subscripts
x, y , and z of a letter (non-bold) represent the respective
x, y, and z-coordinates (or components) of a point (or a
vector) named by the same letter (bold). For example, pxdenotes the x-coordinate of the point p. Multiplication of two
quaternions uses the operator ⊗. In a slight abuse of notation,
multiplication of a quaternion q and a vector v, i.e. q ⊗ v,
implies that v is represented as the pure quaternion (0,v⊤)⊤.
In addition, the superscripts ‘−’ and ‘+’ are used to refer to
the prior and posterior estimates produced by a Kalman filter.
II. RELATED WORK
Our effort draws upon works from a diverse set of related
fields: aerodynamics, feature tracking, Kalman filtering, and
simultaneous localization and mapping (SLAM).
A. Aerodynamics
Aerodynamics are known to play a key role in affecting a
flight motion. Calculation of forces such as air drag and lift
results in more accurate estimates, as done for an aircraft’s
motion using a downward-facing camera [25]. In robot table
tennis, the effects of drag and Magnus were incorporated into
estimation of a ping-pong ball’s motion with varying degrees
of spin [26], [27]. Unfortunately, closed forms for lift and
drag forces so far exist only for well studied shapes such
as a ball and airfoil [28, pp. 258–262, 349], and they do
not generalize to irregular shapes such as a polyhedron. The
estimation scheme in this paper is described for polyhedral
objects whose topological structures can be represented by
planar graphs, achieving a degree of generality as many objects
can be approximated in this way. Note that machine learning
techniques were used to produce models of arbitrarily shaped
objects in flight under the effects of aerodynamics and other
non-linear dynamics [22], based on training over numerous
flight trajectories from laborious hand throws.
B. Feature Tracking
Coupled with motion estimation is the problem of tracking
an object’s features in its changing images in order to generate
observables for setting up an estimation algorithm. Disparities
between features from observation and those from prediction
are used to estimate the object’s pose via weighted least
squares [29], [30], non-linear minimization [31], or statistical
modeling [32], [33], [34]. Model-based pose tracking was
applied to estimate the pose of a spacecraft with different
ways of initialization [35]. In general, these approaches, by
minimizing measurement errors, led to frame rates (up to 30Hz) that were too low for high-speed estimation, and also
suffered from erratic performance in case of large disparities
between observed and predicted features.
To make use of the object’s topological structure, match-
ing was conducted between a “projection” graph constructed
from images with a model graph, where edges and vertices
contained attributes pertaining to features [36]. Approximate
3
graph matching algorithms were developed to produce sub-
optimal solutions to object recognition, while remaining tol-
erant to measurement errors and image noise [37], [38].
Recently, deep neural networks were trained on synthetic data
to track stationary objects with occlusions [39], [40].
C. Kalman Filtering
For motion estimation, the Kalman filter [41] remains the
golden standard due to its effectiveness at smoothing out
estimates in the presence of noisy measurements, as well as
its computational efficiency. An extended Kalman filter (EKF)
using dual quaternions closely calculated the constant velocity
of an object from its images taken by a single camera [21]. A
modified EKF was able to estimate the rigid body dynamics
of a moving object by exploiting the epipolar constraint [25],
[42], [43]. For localization and navigation of air and ground
vehicles, visual inertial navigation systems (VINS) also made
use of the epipolar constraint in the measurement models
of a linear Kalman filter [44], an unscented Kalman filter
(UKF) [45], and an EKF [46]. In [47], an alternative to
the use of epipolar constraints was proposed. This approach
derived a measurement model for every observed static feature
to express geometric constraints over its observing cameras.
State correction in the EKF was then based on a range space
projection of measurement residuals over all features, which
were independent of errors in the feature coordinates.
Unit quaternions, used for representing orientation (or at-
titude), cannot be updated through vector addition by a con-
ventional Kalman filter because they are only closed under
operations (i.e., multiplications) of the Lie group SO(3) with
its elements represented by quaternions. To maintain the
invariance of the rotation representation, the multiplicative
extended Kalman filter (MEKF) from [48] first performed
an EKF style update on a state that included the position
and velocity components, as well as some unconstrained
parameters. Then, on orientation update, these parameters were
used to generate through normalization a unit error quaternion
that measured the difference between the true orientation and
the estimated one. The second-order version of MEKF was
given in [49] for three types of unconstrained orientation
parameters including Euler angles, Rodriguez parameters, and
a quaternion. The MEKF has been used widely in applications
such as attitude estimation [50], inertial integrated naviga-
tion [51], and integration with GPS [52]. Generally relying on
on-board inertial sensors, the MEKF is difficult to tune and
known to have divergence issue due to its ad hoc normalization
to maintain the unit quaternion constraint.
The invariant extended Kalman filter (IEKF), first intro-
duced in [53] for attitude estimation, leveraged the left in-
variance of the vector field describing the orientation-only
kinematics of a flying object under the multiplication action
of SO(3) with unit quaternion elements. Extended later for
velocity estimation and to right invariance on a Lie group [54],
the IEKF preserves the unit norm of the quaternion estimate, as
well as “symmetry” in the sense of not altering the dynamics of
error, whose component for orientation is in the multiplicative
form as inspired by MEKF. With a larger expected domain
of convergence than EKF and guaranteed convergence for
invariant observations [55], the IEKF has been popularized
in applications such as spacecraft attitude estimation [54] and
unmanned aerial vehicle navigation [56]. Such applications
typically employ sensors such as gyroscopes, accelerometers,
and magnetometers that output state variable values directly
(or after a Lie group action such as rotation) so the IEKF
becomes applicable. In our estimation problem, an IEKF is
inapplicable for three reasons. First, the rigid body dynamics
now includes aerodynamic effects such as drag, which is
quadratic in the velocity’s magnitude and also linear in its
normalization, and lift, which is quadratic in the angular ve-
locity. Consequently, the system dynamics no longer preserves
invariance under multiplication by members of a Lie group
such as SE(3). Second, since observations are performed by
cameras under perspective projection and radial distortion, the
equivariant property [54] for the output function does not
hold either.1 Third, the object under estimation is meant to
be natural or man-made with a small size, and it would be
unrealistic and also defeat the purpose of this estimation to
embed a gyroscope, accelerometer, or magnetometer inside
the object. In fact, the constrained EKF to be introduced in
Section IV will effectively keep the orientation estimate within
SO(3). Interestingly, a recent finding [58] has shown that,
when the state space is a Lie group, a version of the EKF
with an equality constraint is equivalent to the IEKF.
A third approach to orientation estimation is constrained
Kalman filtering which in principle can effectively deal with
a general equality constraint. In this approach, the posterior
estimate is still obtained from minimizing the variance of
the estimated error but now with the constraint incorporated
via the use of a Lagrange multiplier. This forces the state
estimate to stay on the constraint surface, e.g., SO(3) for
an orientation estimate. To our best knowledge, all existing
methods have considered either single constraint that is linear,
quadratic [59] or special quadratic (e.g., unit norm [23], gen-
eralized norm [24]), or multiple constraints that are linear [60]
or nonlinear after linearization [61]. In [23] and [24], second
order sufficient conditions were used to choose from multiple
values of the Lagrange multiplier satisfying the first order
necessary condition equation.
The gain constrained Kalman filter (GCKF) [61] was de-
rived from constrained recursive least squares minimization
of the Kalman gain matrix, which constrained weights used
to correct the state estimate. The GCKF was generalized to a
UKF with nonlinear models under linear constraints [62], and
to a norm-constrained Kalman filter (NCKF) dealing with one
nonlinear constraint on the state’s norm [23]. The NCKF was
then extended to handle a single quadratic form constraint [24].
Other approaches included a smoothly constrained Kalman
filter [63] and a two-step UKF that constrained the state’s
statistics [64].
As far as we know, none of existing works have dealt
with multiple nonlinear constraints directly without linearizing
them. In this paper, we need to handle three unit quaternion
1In visual inertial SLAM where a monocular camera was used as one of thesensors, the inventor of the IEKF and his colleagues opted for an UKF. [57]
4
constraints that correspond to the orientations of the object and
the two cameras. We will present a solution to the constrained
filtering that works for a set of general quadratic constraints.
D. Simultaneous Localization and Mapping
SLAM algorithms solve a similar problem of estimating
the pose and motion of a robot relative to its surroundings
by producing a map of features in the three-dimensional (3D)
world space. MonoSLAM used a single camera with an EKF
to estimate a robot’s pose, velocity, and angular velocity,
along with feature points in 3D [65]. Other approaches tracked
the image coordinates of features in the estimated state, and
described a projection from 3D to two dimensions (2D) in the
measurement model of an EKF [66] or UKF [67]. VINS uses
vision and inertia sensors for real-time estimation required for
robot navigation. A UKF was used to estimate the pose of
an autonomous rotorcraft with an inertial measurement unit
(IMU) and cameras [68], [69]. Stereo vision was employed
to improve angular velocity and position estimation over
traditional approaches via sensor fusion in a UKF [70].
III. SYSTEM DYNAMICS
Consider a flying object with mass m as shown in Fig. 2.
The object, with known geometry and physical properties, has
a body frame Fb located at its center of mass o and defined
by its principal axes. In the frame, the inertia tensor Q =˝
B(x ·x)I3 −xx⊤ dV is a diagonal matrix. Here, I3 is the
3 × 3 identity matrix. The rotation of Fb from some world
frame Fw is described by a unit quaternion r. Let bv andbω be the object’s velocity and angular velocity expressed
Fig. 2. Flying object observedfrom a stationary world frame Fw .A body frame Fb is located at itscenter of mass.
in Fb, respectively. The ob-
ject’s velocity in Fw is given
by a quaternion product: wv =r ⊗ (bv) ⊗ r∗ where r∗ is the
conjugate of r.
Denote by wf the non-
gravitational force exerted on
the object in the world frame
Fw. Newton’s equation takes
on the form
wv =
00−g
+wf
m, (1)
where g ≈ 9.80665 m/s2 is the
gravitational acceleration constant. Euler’s equation assumes
the form
Q bω + bω × (Q bω) = bτ ,
where bτ is the external torque in the body frame. We have
bω = Q−1(
bτ − bω ×(
Q bω))
. (2)
The object is subject to aerodynamic forces whose effects
are non-negligible if its mass density is low. Without account-
ing for these forces, it can be difficult to accurately track
the object’s state, or predict a future state for the purpose
of planning. To describe the aerodynamic forces, consider a
snapshot at a time instant during flight. The object, viewed
“stationary”, is experiencing a flow of air with velocity −bv
expressed in its body frame Fb. There are two primary
aerodynamic forces: lift, denoted bf l, and drag, denoted bfd.
Lift force acts on the object due to air particles traveling
smoothly over its surface ∂B and producing pressure forces
normal to the surface. Bernoulli’s equation [72, pp. 20–
21], after elimination of constant static pressure, gives the
following relationship between air pressure p and air speed u:
p =1
2ρu2, (3)
where ρ is air density. The total lift in the frame Fb is then
calculated by integrating this pressure over ∂B as follows:
bf l =1
2ρ
"
∂B
(
bu(e) · bu(e))
n(e) de, (4)
where bu(e) is the air velocity at a surface area element e
with its exact form yet to be determined, and n(e) is the unit
normal to the surface element. Meanwhile, the air pressure
yields a torque about the center of mass o:
bτ l =1
2ρ
"
∂B
(
bu(e) · bu(e))
e× n(e) de, (5)
Consider a neighborhood N of the object in which the
air flow is affected by its presence at the same time instant.
Under the assumption that the air flow is incompressible and
irrotational [73, p. 100], the air velocity at a point e ∈ N is
∇φ(e), where φ, known as the velocity potential, satisfies the
Laplace equation below:
∇2φ(e) = 0. (6)
Here, ∇2 is the Laplacian operator. Denote by ∂N the
outer boundary surface of the neighborhood N . The Laplace
equation (6) is subjected to two boundary conditions:
φ(e) = −bv, for all e on ∂N , (7)
∇φ(e) · n(e) = 0, for all e on ∂B. (8)
Calculation of φ involves numerically solving (6) for all the
points within the neighborhood N containing the object. This
is detailed in the supplemental material [71].
Meanwhile, within a thin boundary layer next to the object’s
surface, the viscosity of air exists to produce friction with the
surface. This leads to two effects inside the thin region [72,
p. 42]: 1) air rotating with the surface and 2) drag force. In
the former, air molecules with zero velocity relative to the
body contribute a rotational component to air velocity. This
creates the Magnus effect [72, pp. 16–33]. Combining the
component with the air velocity outside of the boundary layer
from equation (6), we have
bu(e) ≈ ∇φ(e) + bω × be, (9)
for an element e within the layer with body coordinates be.
Moreover, the boundary layer contributes two types of drag
force: pressure drag from the difference in pressure between
the front and back side of the object, and skin friction from
tangential stress on the surface [28, pp. 201–202]. Due to
5
Fig. 3. Algorithm overview for calculating aerodynamic forces and torques.
the mathematical difficulty in modeling these effects, the
following approximation is adopted [73, pp. 211–215, 339]:
bfd = −1
2ρCdA
(
bv · bv)
bv, (10)
where Cd is the drag coefficient and A is the cross-sectional
area. The approximated drag force acts through the object’s
center of mass, yielding no torque. The coefficient Cd, de-
pendent on the object’s shape, is determined from studies on
the aerodynamics of general shapes [74], [75]. To compute
the area A, all the points on the object’s surface are projected
orthographically onto a plane with normal bv. The convex hull
of their image points in the plane is constructed with its area
assigned to A.
Finally, we apply the calculated forces and torques in
Newton’s and Euler’s equations (1)–(2), where
wf = r ⊗(
bf l +bfd
)
⊗ r∗ (11)
bτ = bτ l. (12)
Fig. 3 gives a diagram of the steps to calculate aerodynamic
forces. The Laplace equation is solved given the object’s
current velocity estimate bv. The resulting potential field φ is
used along with bω to calculate the air velocity bu(e) via equa-
tion (9). Surface integration (4) and (5), based on Bernoulli’s
equation (3), are carried out to yield the lift force bf l and
torque bτ l. The drag force bfd is simultaneously computed
from the velocity estimate according to (10). Newton’s and
Euler’s equations of dynamics are then integrated to yield new
velocity estimates for the next round of calculations.
The state of the flying object is described by the 13-vector
ξ =(
o⊤, r⊤,wv⊤, bω⊤)⊤
. (13)
The state is subject to a constraint that its quaternion com-
ponent r keeps unit length, i.e. |r| = 1. Moreover, r has the
following derivative given in Appendix C of [76]:
r =1
2r ⊗ bω. (14)
This, together with o = wv, (1), and (2), forms the following
system of nonlinear differential equations:
ξ ≡ a(ξ) =
wv12r ⊗
bω
g +wf
mQ−1
(
bτ − bω × (Q bω))
. (15)
Fig. 4. A camera captures images of the object by projecting it onto theimage plane Π about its center of projection c, where the camera’s frameof reference Fc is located. The x- and y-axes of Fc are aligned with theu- and v-axes in the image plane, and the z-axis in the normal direction.The distance between c and Π is equal to the focal length of the camera.Additionally, a second camera (referred to by use of a single quote ‘′’) withcenter of projection c′ and frame of reference F ′
c is illustrated, producing thetwo-view geometry of a point p from which the rays to c and c′ intersects
the image planes at pq = (pp⊤, 1)⊤ and pq′ = (p′
p⊤, 1)⊤. The pointsc, c′, and p form the epipolar plane drawn with the normal n.
IV. OBSERVABLES FROM VISION
The object’s state ξ is estimated based on measurements
extracted out of its images taken simultaneously by two
cameras. Observables are in part produced by projection from
individual cameras, and in part constructed to accommodate
the epipolar geometry of the cameras. Parameters of camera
poses can also be included as observables to help reduce
estimation errors.
A. Camera Projection Model
A single camera observing the flying object is shown at
the bottom left in Fig. 4. At the camera’s focal point wc is a
frame Fc with its z-axis perpendicular to the image plane Π.
The rotation of Fc from the world frame Fw is described by
a quaternion rc. The plane Π has a local coordinate system
with the origin at the upper left corner of the image, the u-
axis pointing rightward, and the v-axis pointing downward.
We use the projection model described in detail by Forsyth
and Ponce [77, pp. 16–17]. Let p be a point on the object,
and denote bp,wp and cp as its coordinates in the body, world,
and camera frames, respectively. They assume the following
mappings:
wp = o+ r ⊗(
bp)
⊗ r∗, (16)cp = r∗c ⊗ (wp− wc)⊗ rc. (17)
With cp = (cpx,cpy,
cpz)⊤, the pinhole camera model yields
the normalized image coordinates
pp =
(
ppxppy
)
=1
cpz
(
cpxcpy
)
. (18)
Brown’s distortion model [78] is then used to map the pointpp to a distorted point pp:
pp = δr (pp) + δt (
pp) , (19)
6
where, denoting = ‖pp‖,
δr (pp) =
(
1 + k12 + k2
4 + k36)
pp, (20)
δt (pp) =
(
2h1ppx
ppy + h2(2 + 2 pp2x)
h1(2 + 2 pp2y) + 2h2ppxppy
)
, (21)
for some coefficients k1, k2, k3 of radial distortion, and
h1, h2 of tangential distortion. The distorted point is finally
transformed into image coordinates:
ip =
(
u
v
)
=
(
α −α cot θ0 β/ sin θ
)
(pp) +
(
uc
vc
)
, (22)
where α and β are scale factors, θ the skew angle, and
(uc, vc)⊤ the image center. Due to manufacturing inaccuracy
of cameras, all five parameters in (22) have to be approximated
to account for various sources of error.
Note that bp,wp and cp are 3-vectors, with bp known
beforehand (as the position of a feature on the object), whilepp, pp, and ip are 2-vectors. The sequence of transformations
is best summarized as follows:
bp(16)−−→ wp
(17)−−→ cp
(18)−−→ pp
(19)−−→ pp
(22)−−→ ip. (23)
Since bp is known, wp, cp, and pp are functions of the object’s
position o and orientation r, as well as the camera’s position c
and orientation rc (as a unit quaternion).
Let us add a second camera. Shown at the bottom right in
Fig. 4, this camera has focal point c′, and its body frame Fc′
has a relative orientation described by the quaternion rc′ to the
world frame Fw. We denote by c′p the coordinates of p in the
frame of the second camera, p′
p the corresponding normalized
image coordinates, p′
p the distorted point by this camera, andi′p the final image point. The transformation sequence on the
point p due to the second camera is:
wp −→ c′p −→ p′
p −→ p′
p −→ i′p.
Given ip, we recover pp by solving (22) and then pp by
solving (19). This two-step process is carried out using the
Levenberg-Marquardt algorithm for nonlinear least-squares
minimization [79], [80]. Similarly, the coordinates p′
p can be
solved for given i′p.
As illustrated in Fig. 4, two rays, originating at p on the
object and passing through the two focal points c and c′,
intersect the image planes at two points q and q′, respectively.
The five points p, c, c′, q, and q′ determine the epipolar plane.
The epipolar constraint states that, in the world frame, the five
points wc,w c′,w q,w q′, and wq are co-planar, which can be
compactly represented by the following equation [77, pp. 200]:
(
pp⊤, 1)
E(
p′
p⊤, 1)⊤
= 0, (24)
where E is a 3×3 skew-symmetric matrix with rank two [81],
[82]. Equation (24) is enforced over the camera parameters
during calibration.
Let bp1, . . . ,bpn be all the feature points on the object. Let
V and V ′ be the sets of point features that are currently visible
in the respective images produced by the two cameras. The
first type of observables are these visible features which form
the vector
ya =(
. . . , ip⊤i , . . . ,
i′p⊤
j , . . .)⊤
, (25)
for all bpi ∈ V, bpj ∈ V ′.
Every feature point wpj introduces an epipolar plane with
normal nj . Since its two image points ipj and i′pj are
subjected to noise, the corresponding epipolar constraint (24)
is not exactly satisfied for bpj . Instead, we include nj to indi-
rectly enforce the constraint. The second vector of observables
yb gathers the normals over vertices visible in both images:
yb = (. . . , n⊤j , . . .)
⊤, (26)
for all bpj ∈ V ∩ V ′. The full vector of observables is
y(ξ) =
(
ya
yb
)
. (27)
B. Augmenting the State with Camera Poses
The two cameras’ positions c, c′ and orientations rc, rc′ ,known as the extrinsic camera parameters, are approximated
offline via a calibration procedure. Their measurement er-
rors may compromise estimation accuracy during the object’s
flight. In particular, the epipolar geometry is sensitive to the
camera poses and their calibration errors. Motivated by SLAM
and VINS algorithms [46], [47], [65], we conduct online
estimation of parameters to find an agreement between a pose
estimate of the object and those of the cameras.
The state vector ξ in (13) is now augmented to include the
pose of each camera:
x = (o⊤, r⊤, wv⊤, bω⊤, c⊤, r⊤c , c′⊤, r⊤c′ )⊤, (28)
The state x has 27 variables in total. There are two reasons
for doing this: 1) to allow correction of calibration errors due
to hand measurements, and 2) to (indirectly) accommodate the
epipolar constraints. From equation (15), the system dynamics
are subsequently augmented:
x = b(x), (29)
where the derivatives of c, rc, c′ and rc′ are all zero. We also
write the vector of observables as y = y(x) despite that it
does not depend on state variables wv and bω.
V. KALMAN FILTERING WITH QUADRATIC CONSTRAINTS
Kalman filter-based motion estimation relies on the dynam-
ics model (29) to propagate the state of the object with time,
and the measurement model described in Section IV-A to
correct the state. Visual inputs are acquired at discrete time
instants. This, plus the need to cope with nonlinearities present
in both models, leads to the use of the hybrid extended Kalman
filter (EKF) [83, pp. 403–407].
Denote by xk the value of the state x, and yk the value
of the vector y of observables, all at the k-th time instant.
The size m of yk varies with the features that are currently
observable. The measurement model can be written as
yk = h(xk,νk), (30)
where the m-vector νk is the zero-mean Gaussian white noise
whose covariance is given by an m×m matrix Rk.
The EKF for the system of (29) and (30) estimates the
distribution of the state x. It first forward integrates the
7
linearized dynamics derived from (29) using the Jacobian of
b(x), which is evaluated at the posterior estimate x+k−1 from
time instant k − 1. The result is a prior estimate x−k at time
instant k. Obtain its 27× 27 error covariance matrix P−k . The
estimate x−k is then corrected to yield a posterior estimate
x+k based on the measurement residual ǫk = yk − h(x−
k ) as
follows:
x+k = x−
k +Kkǫk, (31)
where Kk is a 27 × m gain matrix to be determined next.
Using (31), we write out the 27×27 posterior error covariance
matrix P+k for x+
k in terms of P−k :
P+k = (I27 −KkHk)P
−k (I27 −KkHk)
⊤ +KkRkK⊤k , (32)
where I27 is the 27 × 27 identity matrix, and Hk is a
m×27 matrix representing the linearized measurement model,
computed as the Jacobian of h(x) at x−k . Minimization of the
trace tr(P+k ) of P+
k (which represents exactly the variance of
error in the estimate x+k ) yields the Kalman gain
Kk = P−k H⊤
k
(
HkP−k H⊤
k +Rk
)−1. (33)
With Kk determined, x+k and P+
k are then evaluated according
to (31) and (32) to conclude the updates at the k-th time
instant.
A. Constrained Kalman Filtering
When the system state is subjected to some constraints, the
unconstrained estimator (31)–(33) performs corrections that
would cause the estimate to drift away from the constraint
surface. To resolve this, the constraints need to be incorporated
into the minimization problem for determining the Kalman
gain Kk. The augmented state x in equation (28) contains
three unit quaternions r, rc, rc′ representing rotations of the
object and two cameras, respectively. These quaternions are
each seen as a unit 4-vector so they induce constraints
ηi(x) ≡ x⊤Aix− 1 = 0, (34)
for i = 1, 2, 3, where
A1 = diag(03,14,03,03,03,04,03,04),
A2 = diag(03,04,03,03,03,14,03,04),
A3 = diag(03,04,03,03,03,04,03,14).
In the above, diag(·) specifies a diagonal matrix whose main
diagonal consists of the listed arguments with 0d and 1d being
d-vectors of zeros and ones, respectively.
The aim is to modify the Kalman filter equations to include
the three quadratic constraints given in (34). The gain matrix
Kk is now obtained through constrained minimization:
minKk
tr(
P+k
)
subject to ηi(x+k ) = 0, i = 1, 2, 3. (35)
Below we will derive a solution to (35) that can be easily ex-
tended to handle multiple quadratic constraints in the general
form.
B. Solution to the Constrained Minimization Problem
Continue to let the time be fixed at instant k. With
no ambiguity, we shall temporarily omit the subscripts in
x+k ,x
−k ,Kk, ǫk, P
+k , P−
k , and Rk. Substituting equation (31)
in with x = x+, the constraints in (34) are expanded into, for
i = 1, 2, 3,
ηi(x+) = (x− +Kǫ)⊤Ai(x
− +Kǫ)− 1 = 0. (36)
We write η = (η1, η2, η3)⊤. Under (32) and (36), the La-
grangian for the constrained minimization problem (35) is
L = tr(
(I27 −H)P−(I27 −KH)⊤ +KRK⊤)
+ λ⊤η,(37)
where the vector λ = (λ1, λ2, λ3)⊤ contains three Lagrange
multipliers, one for each constraint in (36). The optimal values
of K and λ are attained at a stationary point of L.
The partial derivative of L with respect to λi simply
recovers the constraint (34), for i = 1, 2, 3. To derive ∂L/∂K,
we first obtain
∂
∂Ktr(
KRK⊤)
= 2KR, (38)
by making use of the derivative ∂tr(AX⊤)/∂X = A for
matrices A and X , as well as the symmetry of R. Next we
have, by making use of the symmetry of P−,
∂
∂Ktr(
(I27 −KH)P−(I27 −KH)⊤)
=∂
∂Ktr(
KHP−(KH)⊤)
− 2∂
∂Ktr(
P−H⊤K⊤)
= 2KHP−H⊤ − 2P−H⊤. (39)
Meanwhile, differentiation of (36) yields
∂
∂K
(
ηi(x+)
)
= 2Ai x− ǫ⊤ + 2AiKǫǫ⊤. (40)
In deriving (40), we made use of ∂(x⊤Ay)/∂A = xy⊤ for
vectors x, y, and matrix A, as well as the symmetry of Ai.
With (38)–(40) we can write out the partial derivative ∂L/∂K:
∂L
∂K= 2
(
KHP−H⊤ +KR− P−H⊤
+3
∑
i=1
λi
(
Aix−ǫ⊤ +AiKǫǫ⊤
)
)
. (41)
Vanishing of ∂L/∂K is expressed by the equation
KS +N(λ)Kǫǫ⊤ = M(λ), (42)
where
S = HP−H⊤ +R,
N(λ) =
3∑
i=1
λiAi,
M(λ) = P−H⊤ −3
∑
i=1
λiAix−ǫ⊤.
The remaining task is to solve (36) and (42) for K and λ.
First, right multiplication of equation (42) by S−1ǫ yields
Kǫ = J(λ)−1m(λ), (43)
8
where, denoting ε = ǫ⊤S−1ǫ,
J(λ) = I27 + εN(λ),
m(λ) = M(λ)S−1ǫ.
Substitute (43) into (36) to eliminate K and obtain
αi(λ) ≡ ηi(x−) + 2(x−)⊤AiJ(λ)
−1m(λ)
+ (J(λ)−1m(λ))⊤AiJ(λ)−1m(λ) = 0,
(44)
for i = 1, 2, 3. Rewrite the three equations (44) as
α(λ) = 0, (45)
where α = (α1, α2, α3)⊤. The system (45) has 3 equations
in 3 unknowns, and can be solved via root finding with a
combination of the (local) Newton’s method and the (global)
homotopy continuation method [84]. These solvers make use
of the Jacobian ∂α/∂λ, which requires the following partial
derivatives, i, j = 1, 2, 3:
∂m
∂λj
= −εAix−, (46)
∂J−1
∂λj
= −εJ−1AjJ−1. (47)
Equation (47) was derived from differentiating J−1J = I27while making use of the symmetry of J .
If k = 1, the Lagrange multipliers λ are first initialized
by solving (45) via homotopy continuation. If k > 1, the
multipliers’ values are expected to change slightly from those
at time instant k − 1. Hence, these old values can be used
as a starting point by Newton’s method to achieve quick
updates. In the case of divergence, we will fall back on
homotopy continuation. The gain matrix K is next solved from
equation (42) after substituting equation (43) on the left hand
side. The matrix is used in (31) and (32) to produce corrected
state estimates that satisfy the set of quadratic constraints.
The optimization procedure is easily adapted to work with
quadratic constraints in the general form of x⊤Aix+ b⊤i x+ci = 0, 1 ≤ i ≤ m, where Ai is not necessarily symmetric. We
need only make several changes described below. In equations
starting at (40), every appearance of Ai will be replaced with
(Ai + A⊤i )/2. The partial derivative in (40) will include an
extra summand biǫ⊤, and this change will propagate to (41)
and (42) and, subsequently, to the expression for M(λ).
VI. GRAPH-BASED FEATURE TRACKING
Motion estimation has so far assumed a fixed set of known
correspondences between point features in images and those
on the moving object. The system dynamics and observa-
tion function are assumed to have non-varying forms. Such
continuity allows the working of a single Kalman filter. The
property cannot be maintained, however, in the course of the
object’s flight as features emerge and disappear, which intro-
duces discontinuities in observations. This section addresses
the problem of identifying and tracking features of the object
in images with respect to those on its shape model. It is known
that any shape can be approximated arbitrarily well by some
polyhedron or polyhedral mesh (i.e., a triangular mesh). To
avoid an unnecessary diversion into handling of geometric
(a) (b) (c)
Fig. 5. (a) Polyedron P drawn with visible edges as solid lines and occludededges as dotted lines. (b) Image graph G constructed from a (hypothetic)image of P taken at the k-th time instant by a camera. (c) Planar graphM representing the topological structure of P . The unbounded face f0 inM corresponds to the shaded face on P . A hypothesis specifies partialcorrespondences between the vertices of G and those of M .
complexities, in this section we assume the object to be a
polyhedron P whose vertex connectivity can be represented
by a planar graph. The range of P includes all convex and
some non-convex polyhedral.
A. Image Graph and Hypothesis
The polyhedron P has n vertices p0, . . . ,pn−1. Its under-
lying planar graph is referred to as the model graph M , which
is represented by a doubly-connected edge list (DCEL) [85,
pp. 29–33]. The process turns every facet of P into a bounded
region in M , except for one facet, which is “opened up” to
become the unique unbounded region. As shown in Fig. 5(a)
and (c) for an octahedron, every vertex pi of P corresponds
with the vertex vi in M .
A frame generated by either camera is a light intensity
image of P with its faces illuminated differently and showing
discontinuities at edges and vertices. These edge and vertex
features are used to construct a planar graph G, referred to as
the image graph, as illustrated in Fig. 5(b). The features are
detected from the image using a Canny edge detector [86] and
Shi-Tomasi corner detector [87], while their correlation with
vertices and edges of P is yet to be hypothesized. We denote
〈ui, vj〉 if the vertex ui in G corresponds to the vertex vj in
the model graph M . The tracking algorithm hypothesizes a
set of vertex correspondences C(G,M) = {. . . , 〈ui, vj〉, . . .}.
This set represents a bijective mapping from a vertex subset
of G to one of M . The mean square error of C(G,M) at a
state x is defined as follows:
d(C,x) =1
|C|
∑
〈ui,vj〉∈C
‖ui − vj‖2, (48)
where ui represents the image coordinates of ui in G, and
vj represents the image coordinates of vj when the model is
projected on the image plane under the transformation defined
in x.
B. Overview of Work at One Time Instant
The two cameras are synchronized to produce images (or
“frames”) of the polyhedron at equally spaced time instants.
Let us focus on the k-th instant when two image frames
Ik and I ′k are generated simultaneously. The constructed
image graphs from the frames are Gk and G′k, which are
associated with two sets of vertex correspondences C(Gk,M)
9
and C(G′k,M), respectively. The two sets together form a
hypothesis H = (C(Gk,M), C(G′k,M)).
The two correspondence sets in H at time instant k are
first inherited from those either active at time instant k − 1,
or created at the end of processing for that time instant
(including the case k = 1 with no processing at the preceding
instant). They are then updated based on the image frames Ikand I ′
k. This update is detailed in Section VI-C. The vector
yk of observables is then updated using the correspondence
sets C(Gk,M) and C(G′k,M) from the image points of
their included vertices. In the case where either C(Gk,M) or
C(G′k,M), after its update, includes a new correspondence or
removes an existing one, the structural change is propagated to
yk as well as to the measurement model of the Kalman filter
K associated with H . Next, the filter K corrects its prior state
estimate x−k , which has evolved from the posterior estimate
x+k−1 at time instant k − 1 through the system dynamics, to
yield the posterior estimate x+k .
The updated estimate x+k is then applied to the object. With
the estimate we evaluate the error of the hypothesis:
ǫ(H) = d(C(Gk,M), x+k ) + d(C(G′
k,M), x+k ).
When ǫ(H) is high, we must decide whether to retain H in
case the error has improved from time instant k− 1, or reject
it otherwise.
At the k-th time instant, a collection H of hypotheses
(including H) are active, each equipped with a Kalman filter
responsible for estimation under the hypothesis. In the case
that all active hypotheses have been eliminated at the time
instant, new ones will be generated before moving on to the
next time instant. This is described in Section VI-D.
C. Updating a Hypothesis
Let us go back to H as one of the hypotheses active at
time instant k. It was passed on from time instant k − 1— possibly as a newly generated hypothesis. In the images
Ik and I ′k taken at the instant, vertices and edges of the
polyhedron may appear or disappear as its pose has changed
from time instant k−1. Thus, there is a need for updating the
correspondence sets. Below we focus on how C(Gk,M) is
updated from C(Gk−1,M), since C(G′k,M) can be similarly
updated.
Fig. 6. Correspon-dences used to update ahypothesis.
The first stage of the update begins
with computing the maximal correspon-
dence between Gk and Gk−1. This re-
quires some initial vertex correspon-
dences between the two image graphs,
which can be obtained during image
processing. Using the correspondence
sets C(Gk−1, Gk) and C(Gk−1,M) in-
herited from time instant k − 1, we ini-
tialize the correspondence set C(Gk,M)such that 〈ui, vj〉 ∈ C(Gk,M) when-
ever 〈uk, ui〉 ∈ C(Gk−1, Gk) and 〈uk, vj〉 ∈ C(Gk−1,M)for some uk. Fig. 6 illustrates the relationship between the
three sets of correspondences. In the second stage, C(Gk,M)is extended to become a maximal correspondence set via
mapping the image graph Gk with the model graph M .
Fig. 7. Search tree explored by the maximal correspondence algorithm onthe image graph Gk and model graph M drawn in the top left corner. Eachnon-root node represents a correspondence of two vertices from Gk and M ,or 〈u,∅〉 in the case that no correspondence is made, effectively ignoringvertex u. The pair of bolded edges in Gk and M identify the initial edgecorrespondence . Each leaf is labeled with a set C, produced by collecting allvertex correspondences along the path descending from the root to the leaf.The algorithm explores a total of 20 sets of correspondences, of which 6 areshown in the expanded subtree.
Both stages involve computing the correspondence between
two planar graphs, first Gk−1 and Gk, and then Gk and M .
This correspondence problem is one of subgraph isomorphism,
which can be done in linear time [88]. However, due to the
presence of noise in images, some tolerance to erroneous
edges and vertices must be allowed. The matching problem
becomes NP-complete, as a result of deciding whether an edge
or vertex should be considered erroneous or not. Approaches to
error-tolerant graph matching include tree search algorithms in
exponential time [36], [89], [90], and sub-optimal algorithms
in polynomial time [38], [91].
We here propose a tree search algorithm to extend
C(Gk,M). The algorithm is presented in the form of a
subroutine A(Gk,M,C0), where C0 is the initial value of
C(Gk,M). Once it is described, we call A(Gk−1, Gk, D0),where D0 is the initial value of C(Gk−1, Gk), to extend
the correspondence between Gk−1 and Gk. To also address
generation of a new hypothesis, we here let the algorithm
start with C(Gk,M) = ∅. A depth-first search of the decision
tree is performed by matching edges of Gk with different
edges of M . Fig. 7 demonstrates a portion of the search
performed on an example pair of Gk and M . The traversal
starts with {〈u0, v4〉, 〈u1, v1〉} derived from matching the
ordered edges (u0, u1) ∈ Gk and (v4, v1) ∈ M . To find
the correspondences for the vertex u2 adjacent to u1, it scans
the edges incident on v1 clockwise, yielding the branches at
the third level of the tree that represent the correspondences
〈u2, v3〉, 〈u2, v0〉, and 〈u2, v5〉. A fourth branch representing
〈u2, ∅〉 is added to include the possibility of u2 not matched.
This traversal continues until all vertices have been visited.
As a result, every leaf node reached denotes a possible
correspondence of the two graphs formed by collecting vertex
correspondences along its path from the root. For example, in
10
Hypothesis update
Dynamics
Kalman filtering
Fig. 8. Overview of the steps to update a hypothesis, estimate its correspond-ing state, and compute its error.
Fig. 7, the leaf node 〈u3, v5〉 denotes the correspondence set
C1 = {〈u0, v4〉, 〈u1, v1〉, 〈u2, v3〉, 〈u4, v0〉, 〈u3, v5〉}.
Multiple correspondence sets are often generated by the tree
search algorithm. A metric is thus needed for comparing them.
A correspondence set C1 is larger than another correspondence
set C2, denoted C1 ≻ C2, if(
|C1| = |C2| ∧ d(C1, x−) < d(C2, x
−))
∨
(|C1| > |C2| ∧ d(C1, x−)− d(C2, x
−) < δ),(49)
for some small threshold δ > 0. If neither C1 ≻ C2 nor
C1 ≺ C2, then C1 = C2. The maximum correspondence set
C∗ is chosen, i.e., we have C(Gk,M) = C∗. In case of a tie,
only one is chosen.
Similarly, we update the correspondence set C(G′k,M)
based on the image graph G′k generated by the second camera.
Update of the hypothesis H is then completed for the time
instant. The Kalman filter is applied as described in Section V
to generate the posterior estimate x+k . Figure 8 gives an
overview of the steps described thus far for updating and
evaluating a hypothesis.
Images of the flying object are subject to noise that may
lead to erroneous edges and vertices appearing. To cope with
this and provide some tolerance, the hypothesis H is rejected
if ǫ(H) ≥ Γ in each of the l most recent time instants. Proper
tuning of Γ and l helps the algorithm settle on a small set of
low-error hypotheses while subjected to noise that periodically
appears in the image.
D. Generating New Hypotheses
If k = 1 or all hypotheses have been rejected based on
the posterior estimate x+k , the algorithm attempts to generate
a new set of hypotheses. Start with a correspondence set
C0 = {〈u1, v1〉, 〈u2, v2〉} which involves adjacent vertices
u1 and u2 in Gk and v1 and v2 in G′k. A maximum
correspondence set C(Gk, G′k) is constructed using the tree
search subroutine A(Gk, G′k, C0). Next, by matching the edge
(u1, u2) with an edge of M , the same algorithm finds all
maximal correspondence sets C1, . . . , CN between G and Mwhen the edge 〈u1, u2〉 is matched with the edge 〈v1, v2〉.For each Ci, i = 1, . . . , N , we construct a correspondence
set C ′i between G′
k and M such that 〈u′, v〉 ∈ C ′i whenever
〈u, u′〉 ∈ C(Gk, G′k) and 〈u, v〉 ∈ Ci for some vertex u. A
hypothesis Hnew = (Ci, C′i) is thus obtained. To evaluate its
error ǫ(Hnew), we need a new state estimate x+k of the polyhe-
dron at time instant k. The estimate is composed of the object’s
position o and orientation r initialized under the hypothesis
Hnew, and velocities wv and bω chosen depending on the time
instant of the hypothesis generation. 2 If ǫ(Hnew) < Γ, the
hypothesis is added along with its Kalman filter initialized by
x+k . The above procedure repeats by matching (u1, u2) with
other edges of M at its start, returning any hypotheses with
low errors.
With the new hypothesis Hnew = (C(Gk,M), C(G′k,M)),
we can initialize the object’s pose which, together with ve-
locity guesses, constitute the new posterior state x+k at the
k-th time instant. Identify all pairs of corresponding vertices
〈ui, u′j〉 from Gk and G′
k under Hnew. Back-project their
image coordinates into world coordinates, which can then be
used to recover the transformation from the body frame to the
world. This is detailed in Appendix A.
VII. EXPERIMENTS
To demonstrate the accuracy and robustness of the presented
estimation scheme, experiments were conducted with three
different objects: a wooden frame, rugby ball, and foam
polyhedron. The first experiment was conceived to have a
source of ground truth available, while the next two involved
more practical objects. The rugby ball experiment addressed
motion estimation in sports where objects tend to have smooth
surfaces and experience large Magnus forces. The polyhedron
experiment aimed at a generally shaped object, and had
the objective of demonstrating estimation with non-fiducial
features.
As summarized in Table I, various compositions of the three
basic modules (aerodynamics, Kalman filtering, and graph-
based feature tracking), with increasing complexity, were
validated. Five different estimators were used: an extended
Kalman filter (EKF), unscented Kalman filter (UKF) [92],
quadratically constrained Kalman filter (QCKF) described in
Section V-B, a QCKF with aerodynamics approximated for
a sphere (QCKF-S), and a QCKF with aerodynamics fully
modeled (QCKF-A). All five estimators used the dynamics
model (15) (with the lift force f l and drag force fd set to zero
by the first three estimators). They also used the projection
model of Section IV-A while the quaternion constraints (36)
were incorporated in all but the first estimator. In the case
of the UKF, the quaternion constraints were enforced using a
projection method [62]. For QCKF-A, lift force was calculated
per the procedure in [71]. The QCKF-S instead approximates
the object by a volume-equivalent sphere, and applies em-
pirical equations of drag and lift for a spherical shape. The
polyhedron experiment made use of the graph-based feature
tracking algorithm from Section VI.
In each experiment, two Ximea MQ022CG-CM high-speed,
color cameras in stereo vision configuration were used to
capture images simultaneously of an object in flight. Both
cameras were equipped with Navitar NMV-6 lenses with an
81.9 degree field of view. The cameras were carefully cali-
brated with point data obtained by images of a 3D calibration
2At the start of estimation, velocities are unknown and a best guess canbe made. At later time instants, reliable velocity estimates from a previoushypothesis can be used as initial values.
11
TABLE ISUMMARY OF EXPERIMENTS WITH THE QUATERNION CONSTRAINT (QC),
EPIPOLAR CONSTRAINT (EC), AERODYNAMICS (AERO), AND
GRAPH-BASED FEATURE TRACKING (GFT)
Experiment Features
Object Estimator QC EC Aero GFT
WoodFrame
EKF
UKF X X
QCKF X X
RugbyBall
QCKF X X
QCKF-A X X X
FoamPolyhedron
QCKF-S X X X X
QCKF-A X X X X
object3 via bundle adjustment [93] and individually initialized
camera parameters [94]. The two cameras’ parameters were
estimated and poses approximated together via optimization
under the epipolar constraint (24) for minimum reprojection
error. Images were captured at more than 200 fps depending
on the image size in the respective experiment.
Estimation started at time 0 s when an initial estimate was
determined, and ran until the object left both cameras’ fields of
view. The initial estimate of the object’s pose was computed
from a procedure given in Appendix A upon detecting four
(or more) pairs of image points from features/marks on the
object visible to both cameras. During the first few frames,
the object’s initial velocity was estimated roughly via finite
differencing over position. The initial angular velocity had its
estimate set close to zero since an estimate via differencing
would contain a large error. Initial error covariances P+0 of
the state estimate and R0 of the vector of observables were
determined from tuning of the Kalman filter over more than
100 experiment trials.4
Last to introduce is the metric of reprojection error used for
validation of pose estimates. We choose the root-mean-square
error (RMSE) formulated as√
d(C,x), where d is defined in
(49), C is the winning hypothesis, and x is its state estimate.
A. Wooden Frame with Accelerometers
In the first experiment, a wooden frame object made up
of three orthogonal links was thrown by hand across the two
cameras’ fields of view approximately 2.3 m away. The object
contained four vertices shown in Fig. 9(a), with one at the
center as the “origin” of the frame, and the remaining three
at the ends of the three links, colored red, green, and blue,
respectively, and on their axes of symmetry. Each link, treated
as an edge of the object, was easily detected by its color during
image processing.
Near each vertex, linear acceleration was measured by a
tri-axis ADXL335 accelerometer from Analog Devices, with
3composed of checkerboard patterns pasted on the faces of a cube4Covariances were 0.1 for the three components of the object’s position,
0.01 for four components of rotation and three components of velocity, and 1for three components of angular velocity, 0.005 for six components of eachcamera’s position, and 0.001 for eight components of rotation. Measurementcovariances were 25 for the two coordinates of image points in equation (25),and 0.003 for the three coordinates of the epipolar normals in equation (26).
battery
accelerometer
wirelesstransmitter
(a) (b)
Fig. 9. (a) A wooden frame object with three links and four accelerometersattached. Each link is 30 cm long and 2.9 cm by 2.9 cm in the cross section.Three vertices at the ends of each link are marked by black dots, and onevertex at the frame’s “origin” by a black circle. (b) The link representing they-axis (colored green) with one accelerometer at its end and another near theintersection of the three links. The red and blue links representing the x- andz-axes have one accelerometer each at their ends.
range ±29.4 m/s2. Fig. 9(b) shows the configuration of one
axis of the frame. The data acquired by each accelerometer
were transmitted wirelessly using a separate Digi XBee Series
1 modules.5 The object’s inertia tensor was also adjusted to
account for the accelerometers as point masses, yielding the
value Q = diag(5.57, 5.57, 5.60)× 10−3 kg ·m2.
With a total mass of 0.31 kg, the wooden frame is subjected
to negligible effects of aerodynamic forces. The frame was
estimated by the QCKF, EKF, and UKF for 0.62 s at a rate
of 202 fps, and over a distance of 2.13 m within the images
of its flight taken by the two cameras as shown in Fig. 10(a)-
(b). The poses estimated by the EKF and QCKF at a few
moments are also drawn. In images by the two cameras,
pairs of poses estimated by the EKF do not appear to align
well with the object. This is primarily due to absence of
the epipolar constraint and the unit quaternion constraints.
On the contrary, the pose estimated by the QCKF aligns
better in both images as a result of including the epipolar
normals as observables. Consequently, the EKF yielded higher
reprojection errors while the UKF (not shown) obtained very
similar errors to the QCKF due to its enforcement of the
constraints.
Finally, we compare the estimated velocities with those
measured by accelerometers. First, the object’s initial pose
is calculated using carefully extracted image points and the
procedure of Appendix A, and its initial velocity bv by differ-
encing the object’s position. The i-th accelerometer produces
readings ai that satisfy the kinematics equation ai =bv+bω×
bri+bω× (bω× bri), where bri locates the accelerometer in
the body frame. Using four accelerometers, we obtain twelve
quadratic equations in total with nine unknowns in bv, bω,
and bω. The equations can instead be written as a linear system
with twelve (dependent) unknowns: three for bv, and nine for
a 3× 3 matrix W composed from bω and bω. Formulation of
the linear system and extraction of bω from W are discussed in
[18]. At each time instant, the resulting bω updates the object’s
rotation, which is used to convert bv back to the world frame
after its update from integration of bv.
5Before the object was thrown, the accelerometers were calibrated byrepeatedly measuring voltages corresponding to the force of gravity whilelaying flat on a table.
12
EKF
(a)
EKF
(b)
(c) (d)
Fig. 10. Images of the wooden frame in flight, cropped and superposedon a single image from the (a) first and (b) second cameras. The object’spose estimated by the QCKF and EKF are drawn by their edges that werevisible to each camera (yellow and dashed white lines, respectively). Labelsbelow each object indicate the time since the start of estimation. Plots of theestimated and observed (c) linear and (d) angular velocities. Velocity readingsfrom accelerometers in the x, y, and z directions are plotted as solid lines inlight red, green, and blue, respectively. Velocities estimated by the QCKF areplotted as dashed lines in red, green, and blue, and by a UKF as thin solidlines in the same colors.
The method described above of integrating accelera-
tions produce the smooth curves plotted by solid lines
in lighter colors in Fig. 10(c)–(d). For comparisons, these
two diagrams also plot the estimated values of veloc-
ity and angular velocity by the UKF and QCKF. Ve-
locity estimates of the two filters mostly overlap, while
small gaps can be seen among wvx and wvy. At the
ending time 0.62 s, the estimation errors of the QCKF
and UKF were respectively (−0.056,−0.153,−0.091)⊤
and (−0.185,−0.284,−0.107)⊤ m/s for velocity, and
(−0.004, 0.032,−0.026)⊤ and (−0.023,−0.008,−0.009)⊤
rad/s for angular velocity. The difference in these velocity
errors, while seemingly small, yield larger errors (i.e. drift) in
pose estimates through integration. Moreover, in (d) higher
errors are seen to accumulate in bωy and bωz up to time
0.22 s. This is due to the x-axis not being detected as shown
in Fig. 10, effectively reducing the information available for
estimating rotation about the two adjacent orthogonal axes.
Once the x-axis becomes visible again, estimation errors inbωy and bωz quickly improve.
0.12 s
0.58 s
(a)
0.12 s
0.58 s
(b)
(c) (d)
Fig. 11. (a) Predictions of the rugby ball’s position trajectory starting fromestimates at time 0.12 s produce the green curves for QCKF-A, and the redcurve for QCKF. Both estimates are propagated including aerodynamics totime 0.58 s near the end of estimation. The two resulting position trajectoriesare projected onto the image plane of the first camera for comparison with theestimated positions drawn by blue dots. A sequence of rugby ball snapshotsare superposed onto the image plane. The object is spinning primarily aboutits major axis drawn by a thin white line. (b) Estimated trajectory of therugby ball during the segment of flight from 0.12 to 0.58 s. The object’sbody frame is drawn by red, green, and blue dashed arrows, along with itsestimated velocity v (black), angular velocity ω (teal), and lift force FL
(magenta). The RMSE are plotted from the (c) first and (d) second camerasby two polylines, one in blue calculated from estimates of the QCKF, andone black from estimates of the QCKF-A.
B. Rugby Ball
In the second experiment, estimation took place with aero-
dynamics on a thrown rugby ball. The hollow ball was primar-
ily composed of rubber with a mass of 0.42 kg, and took on
the shape of a prolate spheroid with semi-axis lengths 0.145,
0.095, and 0.095 m. Since spherical objects lack features
to contribute measurements during estimation, five markers
were attached to the rugby ball’s surface: a green marker at
one of the two intersections with its major axis, and four
blue markers at all intersections with its two minor axes.
The ball was thrown with a spin about its major axis (i.e.
a “spin pass”), such that the green marker remained visible
and the blue markers rotated into and out of view. The
markers were detected by their color and provided to two
estimators: QCKF-A (with aerodynamics modeled) and QCKF
(without). To measure the cross sectional area needed for
13
TABLE IIRANGES OF RATIOS OF AERODYNAMIC FORCES CALCULATED BY
QCKF-A THROUGHOUT FLIGHT TO THE FORCE OF GRAVITY.
ObjectRange of ratios to gravity
fd f l
Rugby Ball [0.034, 0.063] [0.327, 0.486]
Polyhedron [0.100, 0.361] [0.142, 0.394]
approximating drag force, 400 surface points were computed
as “vertices” to be projected along the velocity direction. The
ball’s flight spanned 0.704 s, at which the object achieved
estimated velocity v = (−2.148, 1.991,−3.547)⊤ m/s and
angular velocity ω = (18.280,−0.1576, 1.772)⊤ rad/s.How does the modeling of aerodynamics improve prediction
of the object’s trajectory? Fig. 11(a) plots the position tra-
jectories starting with estimates by the QCKF-A and QCKF
at time 0.12 s and forward propagated to time 0.58 s near
the end of flight. To evaluate these two estimates fairly, both
trajectories are produced with inclusion of aerodynamics, and
then compared to the “true” trajectory over the remainder of
the flight.6 Hence, errors from the comparisons indicate the
accuracies of the starting estimates. The position trajectory
(green) starting from the estimate of the QCKF-A is seen to
yield a much closer prediction. The position trajectory (red)
of the QCKF falls faster than that of the QCKF-A due to the
absence of an upward lift force. Since the ball was thrown
with a back-spin, the Magnus effect produced an upward lift
force that prolonged the ball’s flight. Fig. 11(b) shows the
estimated trajectory of the ellipsoid by the QCKF-A with the
estimated velocities and resulting lift forces drawn by arrows.
Moreover, the row for the rugby ball in Table II gives ratios of
the magnitude of drag and lift forces to that of gravity, where
f l in particular is seen to be a large contributing force.
When compared to estimates at the end of flight, pre-
diction errors of orientation as zyx Euler angles were
(0.62, 0.42, 0.24) rad for the QCKF, and (0.47, 0.35, 0.18) rad
for the QCKF-A. While the QCKF-A yields lower prediction
errors, in both cases the errors about the z-axis were slightly
larger due to the larger rate of rotation. Inclusion of aero-
dynamics also yielded slightly lower reprojection errors for
QCKF-A shown in Fig. 11(c)–(d). As more image frames were
acquired, integration of dynamics propagated errors to the
object’s pose, resulting in an improvement of approximately
2 px for the QCKF-A by the end of flight.
C. Irregular Polyhedron
In the final experiment, a hollow, polyhedral foam object
was thrown by hand to demonstrate motion estimation along-
side feature tracking in the absence of fiducial markers. The
polyhedron was constructed by combining eight triangular
facets cut from foam boards.7 It had a mass of 0.069 kg and
6This is the trajectory from estimating the object’s state all the way to theend of flight.
7The polyhedron has the following vertices in the frame Fb aligned withits principal axes: (0.051,−0.008, 0.121)⊤, (−0.04,−0.055, 0.101)⊤,(0.073, 0.103,−0.068)⊤, (−0.071, 0.156,−0.002)⊤, (−0.044,−0.095,−0.102)⊤, and (0.031,−0.101,−0.05)⊤.
(a)
(b)
(c)
Fig. 12. Images of the polyhedron in flight, cropped and superposed onto asingle images, as captured by the (a) first and (b) second camera. The poseestimated by QCKF-A is drawn by its edges (yellow lines) that were visibleto each camera. Labels below each object indicate the time since the startof estimation. (c) Timeline of hypotheses and their errors over the flight ofa polyhedron with the thresholds Γ = 800 and κ = 3. Each colored linerepresents a hypothesis that ends when rejected or when the object’s flightends. The hypotheses are numbered 1–7. Generation of new hypotheses occursat 0.4 in which case colors are reused to plot their errors.
inertia tensor of Q = diag(3.20, 1.96, 2.45) × 10−4 kg · m2.
Due to its large size and low weight, the flying polyhedron was
subjected to the effects of aerodynamics, to the extent that air
flew over its faces produced noticeable lift force. Overall, the
aerodynamic effect was more complex than on the rugby ball
from the previous experiment given the polyhedron’s geomet-
ric asymmetry and singularities at its edges and vertices.
Detected corner and edge features on the polyhedron were
inputs to the feature tracking algorithm in Section VI. The
polyhedron was estimated by QCKF-A for 0.52 s at a rate
of 207 fps, and over a distance of 1.73 m within the image.
Fig. 12(a)–(b) shows its images during the flight. The object’s
pose estimates at a few time instants (e.g., 0.1 s) were plotted
with only those edges visible to each camera. In several
frames, some visible edges are clearly missing after not being
detected due to poor lighting of the polyhedron’s faces. Re-
gardless, the feature tracking and estimation algorithms were
able to keep up with the flight. Magnitudes of the polyhedron’s
aerodynamic forces from QCKF-A are shown in Table II.
To demonstrate the performance, Fig. 12(c) plots a timeline
of hypothesis errors for the object’s flight. The algorithm
14
QCKF-S
QCKF-A
RM
SE
(px)
s s
(a) (b)
Fig. 13. Reprojection errors of the QCKF-A and QCKF-S are plottedrespectively in blue and black for the (a) first and (b) second camera.
initially generates two hypotheses labeled 1© and 2©. Both
maintain relatively low error as their estimates improve, until
time 0.25 s when errors spike due to noise in the image. A few
frames later, the hypotheses are able to recover with updated
correspondences, yielding errors that reflect the object’s new
pose. In particular, hypothesis 2©, which is now presumably an
incorrect hypothesis due to its higher error, persists for a short
while just below the threshold for rejection. At approximately
0.35 s, errors again increase due to noise that occurs as a face
of the object disappears. At this moment, seen in Fig. 12(a)
and (b), image processing fails in subsequent iterations to
distinguish between the vertices labeled v1 and v2. A few
iterations of the tracking algorithm have passed before the two
hypotheses are rejected and hypothesis generation is triggered,
yielding five new hypotheses labeled 3©– 7© at 0.4 s. Of these,
hypothesis 3© is tracked with low error as it appears to
correctly identify the objects pose, while 6© and 7© are later
rejected, and 4© and 5© remain within the error threshold.The
object’s state estimate at each frame is chosen from the
hypothesis with the lowest error.
It is inevitable that image noise and imperfect processing
will prevent a single hypothesis from spanning the whole
flight. Regardless, the sequence of poses of the object is
successfully tracked by the hypotheses 1© and 3© (both
conveniently colored blue). Due to the smooth transitioning
of the state estimate for the new set of hypotheses, minimal
progress in estimation is lost during the time when hypothesis
generation takes place.
D. Comparison with Closed-form Lift Force using Spherical
Approximation
The polyhedron was simultaneously estimated using closed-
form equations of drag and lift for a sphere centered at o and
with equal volume (i.e., 2.66 × 10−3 m3). More specifically,
the drag equation in (10) is now calculated with a constant
cross-sectional area A and coefficient Cd ≈ 0.4 [72, p. 261].
The lift force, no longer calculated via solving the Laplace
equation (6), has a form similar to (10):
f l =1
2ρACl(
bv · bv)bω × bv
‖bω × bv‖, (50)
where Cl ≈ 0.22 [95]. Fig. 13(a) and (b) show reprojection
errors from QCKF-A and QCKF-S for the first and second
cameras, respectively. The errors are similar until 0.3 s when
new vertices appearing in images cause measure inaccura-
cies of rotation (resulting in a jump of reprojection error
for QCKF-S). Even after correction, the QCKF-S yielded a
poorer estimate of angular velocity due to not accounting for
lift force from air translating over the non-spherical surface.
Consequently, the estimated orientation lags behind that of the
QCKF-A, resulting in an increase of reprojection error of 6.2and 4.6 px at the end of estimation for the respective cameras.
E. Experiment Summary
The experiments described in this section were conducted
on a Windows PC equipped with a 4-core 3.5 GHz Intel Xeon
CPU and 8 GB of memory. Table III presents computation
times for the five estimators to update and correct the state
estimate, averaged over all experiments for which they were
used. During the update step, two rounds of the fourth order
Runge-Kutta method [96], [97] of integration were performed,
requiring eight evaluations of the dynamics equation, and in
the case of QCKF-A, each time recalculating the aerodynamic
forces. For both the rugby ball and polyhedron, non-uniform
grids consisting of roughly 3, 500 grid points were used,
yielding a linear system of approximately 3, 500 rows and
5, 000 columns to be solved. It is for this reason that the
QCKF-A is most time-consuming.
Meanwhile, the UKF as expected required more compu-
tation time than the EKF due to updating and correcting
multiple sigma points. Although the UKF is known to better
approximate some nonlinear models, it had no advantage in
the estimation of the wooden frame. Moreover, were the UKF
to consider aerodynamics, repeatedly calculating lift and drag
forces for each point would prove the estimator infeasible for
real-time motion estimation.
Estimation errors would be easily computed if data sets of
real images and corresponding motion data for a free-flying
object were available. To the best of our knowledge, no such
data sets have been made available for research use. We rely on
the measurement of pose error in both images (i.e. reprojection
error) to represent the pose error in the world.
Below we argue for the use of reprojection error as a metric
for 3D pose accuracy. We focus on that no two poses of the
object could generate the same pair of images on the two
cameras, unless some degeneracy of projection takes place or
the object assumes certain symmetry. The image coordinates
of three feature points generated by a single camera might
yield six independent equations, each composing the chain
in (23), from which the pose can be solved. However, the
lack of depth information and imaging noise could result in
pose ambiguities. With another image taken simultaneously,
six independent constraints are induced and satisfiable by only
a finite number of poses. The remaining constraints, including
the epipolar constraint discussed in Section IV, further reduce
the possible poses to almost always a unique one. This
reasoning is strengthened by considering not a single pose but
a sequence of poses estimated from pairs of simultaneously
taken image frames of the flying object. The probability of
two different initial states of the object producing the same
sequence of poses is negligible, if not zero. Generally, a unique
correspondence exists between the initial (and subsequently
the current) state and a sequence of images.
15
TABLE IIICOMPUTATION TIME (ms) OF ESTIMATORS (MEAN ± STD.)
Estimator Update Step Correct Step
EKF 0.37± 0.11 1.11± 0.37
UKF 16.8± 5.65 40.9± 9.97
QCKF 0.89± 0.37
QCKF-S 1.21± 0.35 8.36± 2.76
QCKF-A 143.21± 19.8
VIII. DISCUSSION AND FUTURE WORK
This work presents a scheme for motion estimation under
conditions that have previously rendered acquisition of ac-
curate pose and velocity estimates infeasible. During a fast
free flight, the object is susceptible to large aerodynamic
forces (in the case of low mass density), and observed for
only a fraction of a second. Experiments have demonstrated
reduced estimation errors with improvements over the baseline
EKF approach. Thorough nonlinear models of the object’s
dynamics and vision-based observables are presented for use
in robust Kalman filtering. Lift force is calculated using the
Laplace equation, and drag force from closed-form approxima-
tion, yielding improved trajectory estimates for light objects.
Estimation error is further reduced by enforcing multiple
nonlinear constraints on the state at minimal computational
cost. Finally, planar graph matching effectively tracks features
of a polyhedron with no fiducial markers, while dynamic
updating of hypotheses robustly deals with large image noise.
These contributions, though combined into one motion esti-
mation scheme, can also be employed individually to improve
accuracy, enforce multiple constraints, or deal with noise.
Our approach to motion estimation was developed in part
with robotic manipulation in mind. Unlike in VINS and
SLAM, a small margin of error exists for estimates to yield
predicted trajectories that can lead to successful executions of
operations such as batting, catching, etc. The manipulated ob-
jects are usually light and susceptible to air effects such as lift
and drag, as observed in our experiments. Without modeling
of aerodynamic forces, large errors accumulated in the object
trajectories will significantly affect operation outcomes. That
these objects (such as balls in sports) often assume known
geometries and material properties also provides convenience
for such modeling.
While accuracy is a main focus of our approach, com-
putational cost can hinder real time estimation. Table III
presented computation times in which the QCKF operated
at 110 Hz, and the QCKF-A with repeated calculations of
aerodynamic forces operated at just below 10 Hz. An order of
magnitude improvement could be achieved by a combination
of modern computing hardware, tuning of parameters, and
utilization of GPU processing (e.g., in calculation of lift force).
Furthermore, optimizations of the QCKF can be made to
reduce the number of matrix calculations as in [46], including
reduction of quaternion noise to a 3-vector and exclusion of
velocities from equations during correction for observables.
A few potential directions are in line for future work. First,
generalized to irregular polyhedral objects, the approach can
be further extended to curved objects. These include objects
with a single curved surface or with a surface consisting
of multiple curved patches. Features such as high curvature
points, discontinuities between surface patches, textures, or
fiducial markers can provide the observables required for
estimation. Second, the work can be utilized for its originally
intended purpose of targeted batting in 3D to extend the
authors’ work in [10]. Third, robotic catching would also
benefit from the accuracy of model-based estimation. Existing
methods fail to determine a catching configuration for objects
thrown along trajectories that deviate from those used to build
machine learned models [8]. The methods presented here can
provide accurate estimates of any visible trajectory, yielding
low prediction error in a variety of scenarios, and a higher
overall success rate.
Moreover, the use of stereo vision allows the motion esti-
mation approach to be versatile. Two cameras can be mounted
on a humanoid robot to perform motion estimation of a flying
ball in sports for example. As long as features on the ball can
be detected in its images, its trajectory can be perceived, even
in the case that it is moving directly towards the robot.8 The
cameras may be mounted on other mobile robots or vehicles
for estimation where additional parameters may need to be
estimated for localization,9 or fixed to the world to perceive
motion in a scene. The estimation scheme is applicable to tasks
for space robots, where aerodynamic forces can be ignored due
to scarcity or absence of air.
APPENDIX A
INITIALIZATION OF 6-DOF POSE
Once a new hypothesis is created, we estimate the object’s
pose, composed of position o(k) and rotation r(k) at the k-th
time instant. The estimates will be used by a Kalman filter
created under the hypothesis.
Stereo cameras viewing a point p induce the epipolar
geometry described in Section IV-A, which allows for a pair
of image points ip and i′p, respectively generated by the
first and second cameras, to be back-projected to some (but
not necessarily the same) point in the world. The chain of
equations in (23) are solved in reverse order to obtain the world
coordinates wp of their common source point. In particular, the
second step of the solution involves undistortion of pp and p′
p,
yielding pp and p′
p. To recover the point’s third coordinate
from (18), we perform “rectification” of the points pp and p′
p
to transform their image planes to be coplanar [98, p. 159]. To
that effect, the transformed points lie on a line parallel to the
line through the camera centers, allowing for “triangulation”
to determine their z-coordinate [99].
With a set of back-projected points and the set of object’s
vertex coordinates in Fb, the correspondence between them is
known under the new hypothesis Hnew.10 We invoke Horn’s
method [100] to compute the translation o(k) and rotation r(k)
of the frame Fb from Fw.
8with a possible loss in accuracy due to errors in measurement of depth9refer to SLAM and VINS works for solutions to the localization problem10The image points ipj and i′pj are associated with vertices that, via
a hypothesis, correspond with a vertex located at bpk . Hence, the worldcoordinate wpj computed from the image points shares this correspondence.
16
REFERENCES
[1] N. Takeishi, T. Yairi, Y. Tsuda, F. Terui, N. Ogawa, and Y. Mimasu,“Simultaneous estimation of shape and motion of an asteroid forautomatic navigation,” in Proc. IEEE Int. Conf. Robot. Autom., pp. 2861–2866, 2015.
[2] T. Fujiwara, T. Nakamura, Y. Tsuda, and S. Nakasuka, “Motion estima-tion and capturing of tumbling object in non-gravitational field,” IFAC
Proc. Vol., vol. 31, no. 21, pp. 77–82, 1998.
[3] F. Terui, H. Kamimura, and S. Nishida, “Motion estimation to a failedsatellite on orbit using stereo vision and 3D model matching,” in Int.
Conf. Control, Autom., Robot. and Vision, pp. 1–8, IEEE, 2006.
[4] H. Nagamatsu, T. Kubota, and I. Nakatani, “Capture strategy for retrievalof a tumbling satellite by a space robotic manipulator,” in Proc. IEEE
Int. Conf. Robot. Autom., vol. 1, pp. 70–75, IEEE, 1996.
[5] Guoliang Zhang, Hong Liu, Jie Wang, and Zainan Jiang, “Vision-basedsystem for satellite on-orbit self-servicing,” in IEEE/ASME Int. Conf.
Adv. Intell. Mech., pp. 296–301, 2008.
[6] S. I. Roumeliotis, A. E. Johnson, and J. F. Montgomery, “Augmentinginertial navigation with image-based motion estimation,” in Proc. IEEE
Int. Conf. Robot. Autom., vol. 4, pp. 4326–4333, IEEE, 2002.
[7] C.-C. Ho and N. H. McClamroch, “Autonomous spacecraft dockingusing a computer vision system,” in [1992] Proc. IEEE Conf. Decis.
and Control, pp. 645–650, IEEE, 1992.
[8] S. Kim, A. Shukla, and A. Billard, “Catching objects in flight,” IEEE
Trans. Robot., vol. 30, no. 5, pp. 1049–1065, 2014.
[9] T. Senoo, A. Namiki, and M. Ishikawa, “High-speed batting using amulti-jointed manipulator,” in Proc. IEEE Int. Conf. Robot. Autom.,pp. 1191–1196, 2004.
[10] Y.-B. Jia, M. Gardner, and X. Mu, “Batting an in-flight object to thetarget,” Int. J. Robot. Res., vol. 38, no. 4, pp. 451–485, 2019.
[11] H. Li, H. Wu, L. Lou, K. Kuhnlenz, and O. Ravn, “Ping-pong roboticswith high-speed vision system,” in Proc. Int. Conf. Control, Autom.,
Robot. and Vis., pp. 106–111, 2012.
[12] J. Kim and M. Kim, “Smart vision system for soccer training,” in Int.
Conf. Inf. Commun. Technol. Converg., pp. 257–262, IEEE, 2015.
[13] R. Herrejon, S. Kagami, and K. Hashimoto, “Online 3-D trajectoryestimation of a flying object from a monocular image sequence,” in Proc.
IEEE/RSJ Int. Conf. Intell. Robots and Syst., pp. 2496–2501, 2009.
[14] L. Acosta, J. J. Rodrigo, J. A. Mendez, G. N. Marichal, and M. Sigut,“Ping-pong player prototype,” IEEE Robot. and Autom. Mag., vol. 10,no. 4, pp. 44–52, 2003.
[15] Y. Nakabo, I. Ishi, and M. Ishikawa, “3D tracking using two high-speedvision systems,” in Proc. the IEEE/RSJ Int. Conf. Intell. Robots and
Syst., vol. 1, pp. 360–365, 2002.
[16] A. Padgaokar, “Measurement of angular acceleration of a rigid bodyusing linear accelerometers,” Trans. Am. Soc. Mech. Eng., vol. 75,pp. 522–526, 1975.
[17] P. Cardou and J. Angeles, “Angular velocity estimation from the angularacceleration matrix,” ASME J. Applied Mech., vol. 75, no. 2, 2008.
[18] P. Cardou, G. Fournier, and P. Gagnon, “A nonlinear programfor angular-velocity estimation from centripetal-acceleration measure-ments,” IEEE/ASME Trans. Mech., vol. 16, no. 5, pp. 932–944, 2010.
[19] J. L. Marins, X. Yun, E. R. Bachmann, R. B. McGhee, and M. J. Zyda,“An extended Kalman filter for quaternion-based orientation estimationusing marg sensors,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots and
Syst., vol. 4, pp. 2003–2011, 2001.
[20] N. P. Papanikolopoulos, P. K. Khosla, and T. Kanade, “Visual trackingof a moving target by a camera mounted on a robot: A combination ofcontrol and vision,” IEEE Trans. Robot. Autom., vol. 9, no. 1, pp. 14–35,1993.
[21] J. S. Goddard and M. A. Abidi, “Pose and motion estimation using dualquaternion-based extended Kalman filtering,” in Proc. SPIE Conf. 3-D
Image Capt. Appl., vol. 3313, pp. 189–200, 1998.
[22] S. Kim and A. Billard, “Estimating the non-linear dynamics of free-flying objects,” IEEE Robot. Auton. Syst., vol. 60, no. 9, pp. 1108–1122,2012.
[23] R. Zanetti, M. Majji, R. H. Bishop, and D. Mortari, “Norm-constrainedKalman filtering,” J. Guid., Control, and Dyn., vol. 32, no. 5, pp. 1458–1465, 2009.
[24] D. Wang, M. Li, X. Huang, and J. Li, “Kalman filtering for a quadraticform state equality constraint,” J. Guid., Control, and Dyn., vol. 37,no. 3, pp. 951–958, 2014.
[25] P. Gurfil and H. Rotstein, “Partial aircraft state estimation from visualmotion using the subspace constraints approach,” J. Guidance, Control,
and Dyn., vol. 24, no. 5, pp. 1016–1028, 2001.
[26] A. Nakashima, T. Okamoto, and Y. Hayakawa, “An online estimation ofrotational velocity of flying ball via aerodynamics,” IFAC Proc. Vols.,vol. 47, no. 3, pp. 7176–7181, 2014.
[27] Y. Zhang, Y. Zhao, R. Xiong, Y. Wang, J. Wang, and J. Chu, “Spinobservation and trajectory prediction of a ping-pong ball,” in Proc. the
IEEE Int. Conf. Robot. Autom., pp. 4108–4114, 2014.
[28] J. D. Anderson Jr, Introduction to Flight. McGraw-Hill, fourth ed., 2000.
[29] D. B. Gennery, “Visual tracking of known three-dimensional objects,”Int. J. Comput. Vis., vol. 7, no. 3, pp. 243–270, 1992.
[30] T. Drummond and R. Cipolla, “Real-time visual tracking of complexstructures,” IEEE Trans. Pattern Anal. Machine Intell., vol. 24, no. 7,pp. 932–946, 2002.
[31] E. Marchand, P. Bouthemy, F. Chaumette, and V. Moreau, “Robust real-time visual tracking using a 2d-3D model-based approach,” in Proc. the
IEEE Int. Conf. Comput. Vis., vol. 1, pp. 262–268, 1999.
[32] D. G. Lowe, “Robust model-based motion tracking through the inte-gration of search and estimation,” Int. J. Comput. Vis., vol. 8, no. 2,pp. 113–122, 1992.
[33] E. Rosten and T. Drummond, “Fusing points and lines for high per-formance tracking,” in Proc. the IEEE Int. Conf. Comput. Vis., vol. 2,pp. 1508–1515, 2005.
[34] V. A. Prisacariu and I. D. Reid, “Pwp3D: Real-time segmentation andtracking of 3D objects,” Int. J. Comput. Vis., vol. 98, no. 3, pp. 335–354,2012.
[35] J. M. Kelsey, J. Byrne, M. Cosgrove, S. Seereeram, and R. K. Mehra,“Vision-based relative pose estimation for autonomous rendezvous anddocking,” in Proc. IEEE Aerosp. Conf., pp. 20–pp, 2006.
[36] E. K. Wong, “Model matching in robot vision by subgraph isomor-phism,” Pattern Recognit., vol. 25, no. 3, pp. 287–303, 1992.
[37] K. Riesen and H. Bunke, “Approximate graph edit distance computationby means of bipartite graph matching,” Image and Vis. Comput., vol. 27,no. 7, pp. 950–959, 2009.
[38] S. Baloch and H. Krim, “Object recognition through topo-geometricshape models using error-tolerant subgraph isomorphisms,” IEEE Trans.
Image Process., vol. 19, no. 5, pp. 1191–1200, 2009.
[39] M. Garon and J.-F. Lalonde, “Deep 6-DOF tracking,” IEEE Trans. Vis.
Comput. Graphics, vol. 23, no. 11, pp. 2410–2418, 2017.
[40] B. Wen, C. Mitash, B. Ren, and K. E. Bekris, “se(3)-tracknet: Data-driven 6D pose tracking by calibrating image residuals in syntheticdomains,” 2020.
[41] R. E. Kalman, “A new approach to linear filtering and predictionproblems,” Trans. ASME — J. Basic Eng., vol. 82 (Series D), pp. 35–45,1960.
[42] S. Soatto and P. Perona, “Recursive 3-D visual motion estimation usingsubspace constraints,” Int. J. Comput. Vis., vol. 22, no. 3, pp. 235–259,1997.
[43] T. P. Webb, R. J. Prazenica, A. J. Kurdila, and R. Lind, “Vision-basedstate estimation for autonomous micro air vehicles,” J. Guid., Control,
and Dyn., vol. 30, no. 3, pp. 816–826, 2007.
[44] D. D. Diel, P. DeBitetto, and S. Teller, “Epipolar constraints for vision-aided inertial navigation,” in IEEE Workshops Appl. Comput. Vis., vol. 2,pp. 221–228, 2005.
[45] J.-O. Nilsson, D. Zachariah, M. Jansson, and P. Handel, “Realtimeimplementation of visual-aided inertial navigation using epipolar con-straints,” in Proc. IEEE/ION Pos., Loc. and Nav. Symp., pp. 711–718,2012.
[46] A. I. Mourikis and S. I. Roumeliotis, “A multi-state constraint Kalmanfilter for vision-aided inertial navigation,” in Proc. IEEE Int. Conf.
Robot. Autom., pp. 3565–3572, 2007.
[47] K. Eckenhoff, Y. Yang, P. Geneva, and G. Huang, “Tightly-coupledvisual-inertial localization and 3-D rigid-body target tracking,” IEEE
Robot. Autom. Letters, vol. 4, no. 2, pp. 1541–1548, 2019.
[48] E. J. Lefferts, F. L. Markley, and M. D. Shuster, “Kalman filtering forspacecraft attitude estimation,” J. Guid., Control, and Dyn., vol. 5, no. 5,pp. 417–429, 1982.
[49] F. L. Markley, “Attitude error representations for Kalman filtering,” J.
Guid., Control, and Dyn., vol. 26, no. 2, pp. 311–317, 2003.
[50] K. Brown, D. Anderson, C. Watts, and B. Connor, “Fusion of diverseperformance inertial sensors for improved attitude estimation within astabilisation platform for electro-optic systems,” in Image and Signal
Proc. Remote Sens. XXV, vol. 11155, p. 1115516, International Societyfor Optics and Photonics, 2019.
[51] J. L. Crassidis, “Sigma-point Kalman filtering for integrated gps andinertial navigation,” in Proc. AIAA Guid., Navigation. Control Conf.,SanFrancisco, CA, USA, 2005, p. 6062.
17
[52] F. Li and L. Chang, “MEKF with navigation frame attitude errorparameterization for ins/gps,” IEEE Sens. J., vol. 20, no. 3, pp. 1536–1549, 2019.
[53] S. Bonnabel, “Left-invariant extended kalman filter and attitude estima-tion,” in Proc. IEEE Conf. Decis. and Control, pp. 1027–1032, 2007.
[54] S. Bonnabel, P. Martin, and E. Salaun, “Invariant extended Kalman filter:theory and application to a velocity-aided attitude estimation problem,”in Proc. IEEE Conf. Decis. and Control, pp. 1297–1304, 2009.
[55] A. Barrau and S. Bonnabel, “The invariant extended Kalman filteras a stable observer,” IEEE Trans. Automatic Control, vol. 62, no. 4,pp. 1797–1812, 2016.
[56] N. Y. Ko, W. Youn, I. H. Choi, G. Song, and T. S. Kim, “Featuresof invariant extended Kalman filter applied to unmanned aerial vehiclenavigation,” Sensors, vol. 18, no. 9, p. 2855, 2018.
[57] M. Brossard, S. Bonnabel, and A. Barrau, “Invariant kalman filtering forvisual inertial slam,” in Proc. Int. Conf. Info. Fusion, pp. 2021–2028,IEEE, 2018.
[58] A. Barrau and S. Bonnabel, “Extended Kalman filtering with nonlinearequality constraints: a geometric approach,” IEEE Trans. Automatic
Control, vol. 65, no. 6, pp. 2325–2338, 2019.
[59] C. Yang and E. Blasch, “Kalman filtering with nonlinear state con-straints,” IEEE Trans. Aerosp. Electron. Syst., vol. 45, no. 1, pp. 70–84,2009.
[60] D. Simon and T. L. Chia, “Kalman filtering with state equality con-straints,” IEEE Trans. Aerosp. Electron. Syst., vol. 38, no. 1, pp. 128135,2002.
[61] N. Gupta and R. Hauser, “Kalman filtering with equality and inequalitystate constraints,” Tech. Rep. NA-07/18, Oxford University ComputingLaboratory, Aug 2007.
[62] B. O. S. Teixeira, J. Chandrasekar, H. J. Palanthandalam-Madapusi,L. A. B. Torres, L. A. Aguirre, and D. S. Bernstein, “Gain-constrainedKalman filtering for linear and nonlinear systems,” IEEE Trans. Signal
Process., vol. 56, no. 9, pp. 4113–4123, 2008.
[63] J. De Geeter, H. Van Brussel, J. De Schutter, and M. Decreton, “Asmoothly constrained Kalman filter,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 19, no. 10, pp. 1171–1177, 1997.
[64] S. J. Julier and J. J. LaViola, “On Kalman filtering with nonlinearequality constraints,” IEEE Trans. Signal Process., vol. 55, no. 6,pp. 2774–2784, 2007.
[65] A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, “Monoslam:Real-time single camera slam,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 29, no. 6, pp. 1052–1067, 2007.
[66] D. Strelow and S. Singh, “Motion estimation from image and inertialmeasurements,” Int. J. Robot. Res., vol. 23, no. 12, pp. 1157–1195, 2004.
[67] J. W. Langelaan, “State estimation for autonomous flight in clutteredenvironments,” J. Guid., Control, and Dyn., vol. 30, no. 5, pp. 1414–1426, 2007.
[68] S. Shen, Y. Mulgaonkar, N. Michael, and V. Kumar, “Vision-based stateestimation for autonomous rotorcraft mavs in complex environments,”in Proc. the IEEE Int. Conf. Robot. Autom., pp. 1758–1764, 2013.
[69] R. Mebarki, J. Cacace, and V. Lippiello, “Velocity estimation of an uavusing visual and imu data in a gps-denied environment,” in Proc. the
IEEE Int. Symp. Saf., Secur., and Rescue Robot., pp. 1–6, 2013.
[70] M. Dehghani, H. Kharrati, H. Seyedarabi, and M. Baradarannia, “Im-provement of angular velocity and position estimation in gyro-freeinertial navigation based on vision aid equipment,” IET Comput. Vis.,vol. 12, no. 3, pp. 261–275, 2017.
[71] M. Gardner and Y.-B. Jia, “Supplemental Material for ‘Poseand Motion Estimation of Free-Flying Objects: Aerodynamics,Constrained Filtering, and Graph-Based Feature Tracking’.”https://faculty.sites.iastate.edu/jia/files/
inline-files/laplace-equation.pdf, 2021.
[72] L. J. Clancy, Aerodynamics. Pitman Publishing Ltd., 1975.
[73] G. K. Batchelor, “An introduction to fluid dynamics. 1967,” Cambridge,:
UP xviii, vol. 615, 1967.
[74] G. Parker, “Projectile motion with air resistance quadratic in the speed,”Am. J. Phys., vol. 45, no. 7, pp. 606–610, 1977.
[75] C. Brueningsen, J. Marinelli, P. Pappano, and K. Wallace, “Modelingair drag,” The Physics Teacher, vol. 32, no. 7, pp. 439–441, 1994.
[76] Y.-B. Jia, “Quaternion and rotation.” http://web.cs.iastate
.edu/˜cs577/quaternion.pdf, 2019.
[77] D. A. Forsyth and J. Ponce, Computer vision: a modern approach.Prentice Hall Professional Technical Reference, 2002.
[78] C. B. Duane, “Close-range camera calibration,” Photogramm. Eng.,vol. 37, no. 8, pp. 855–866, 1971.
[79] K. Levenberg, “A method for the solution of certain non-linear problemsin least squares,” Quarterly of App. Math., vol. 2, no. 2, pp. 164–168,1944.
[80] D. W. Marquardt, “An algorithm for least-squares estimation of nonlinearparameters,” J. the Soc. Ind. and App. Math., vol. 11, no. 2, pp. 431–441,1963.
[81] H. C. Longuet-Higgins, “A computer algorithm for reconstructing ascene from two projections,” Nature, vol. 293, no. 5828, pp. 133–135,1981.
[82] T. S. Huang and O. D. Faugeras, “Some properties of the E matrix intwo-view motion estimation,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 11, no. 12, pp. 1310–1312, 1989.
[83] D. Simon, Optimal State Estimation. John Wiley & Sons, Inc., 2006.[84] C. Garcia and W. I. Zangwill, “Finding all solutions to polynomial
systems and other systems of equations,” Math. Program., vol. 16, no. 1,pp. 159–176, 1979.
[85] M. De Berg, M. Van Kreveld, M. Overmars, and O. Schwarzkopf,Comput. Geom.. Springer, third ed., 2008.
[86] J. Canny, “A computational approach to edge detection,” IEEE Trans.
Pattern Anal. Mach. Intell., no. 6, pp. 679–698, 1986.[87] J. Shi et al., “Good features to track,” in Proceedings of the IEEE Int.
Conf. Comput. Vis., pp. 593–600, 1994.[88] D. Eppstein, “Subgraph isomorphism in planar graphs and related
problems,” in Graph Alg. Appl. I, pp. 283–309, World Scientific, 2002.[89] L. G. Shapiro and R. M. Haralick, “Structural descriptions and inexact
matching,” IEEE Trans. Pattern Anal. Mach. Intel., no. 5, pp. 504–519,1981.
[90] J. J. McGregor, “Backtrack search algorithms and the maximal commonsubgraph problem,” Software: Practice and Experience, vol. 12, no. 1,pp. 23–34, 1982.
[91] M. Neuhaus and H. Bunke, “An error-tolerant approximate matchingalgorithm for attributed planar graphs and its application to fingerprintclassification,” in Joint IAPR Int. Workshops on Stat. Techn. Pattern
Recognit. and Struct. and Synt. Pattern Recognit. (SSPR), pp. 180–189,Springer, 2004.
[92] S. J. Julier and J. K. Uhlmann, “New extension of the Kalman filter tononlinear systems,” in Sig. Proces., Sens. Fusion, and Target Recognit.
VI, vol. 3068, pp. 182–193, Int. Society for Optics and Photonics, 1997.[93] B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon,
“Bundle adjustmenta modern synthesis,” in Int. Workshop on Vis. Alg.,pp. 298–372, Springer, 1999.
[94] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 22, no. 11, pp. 1330–1334, 2000.[95] A. M. Nathan, “The effect of spin on the flight of a baseball,” Am. J.
Phys., vol. 76, no. 2, pp. 119–124, 2008.[96] C. Runge, “Uber die numerische auflosung von differentialgleichungen,”
Mathematische Annalen, vol. 46, no. 2, pp. 167–178, 1895.[97] W. Kutta, “Beitrag zur naherungsweisen integration totaler differential-
gleichungen,” Z. Math. Phys., vol. 46, pp. 435–453, 1901.[98] E. Trucco and A. Verri, Introductory techniques for 3-D computer vision,
vol. 201. Prentice Hall Englewood Cliffs, 1998.[99] R. Hartley and A. Zisserman, Multiple view zgeometry in computer
vision. Cambridge University Press, 2003.[100] B. K. P. Horn, “Closed-form solution of absolute orientation using unit
quaternions,” J. Optical Soc. of Am. A, vol. 4, no. 4, pp. 629–642, 1987.