comparative assessment of techniques for initial pose estimation using monocular … ·...

Contents lists available at ScienceDirect

Acta Astronautica

Acta Astronautica 123 (2016) 435–445

http://d0094-57

n CorrE-m

journal homepage: www.elsevier.com/locate/actaastro

Comparative assessment of techniques for initial poseestimation using monocular vision

Sumant Sharma n, Simone D'AmicoDepartment of Aeronautics & Astronautics, Stanford University, Stanford, California, USA

a r t i c l e i n f o

Article history:Received 22 July 2015Received in revised form12 December 2015Accepted 23 December 2015Available online 20 January 2016

Keywords:Computer visionPose estimationMonocular visionUncooperative spacecraft

x.doi.org/10.1016/j.actaastro.2015.12.03265/& 2015 IAA. Published by Elsevier Ltd. A

esponding author. Tel.: þ1 404 952 8102.ail address: [email protected] (S. Sharm

a b s t r a c t

This work addresses the comparative assessment of initial pose estimation techniques formonocular navigation to enable formation-flying and on-orbit servicing missions.Monocular navigation relies on finding an initial pose, i.e., a coarse estimate of the attitudeand position of the space resident object with respect to the camera, based on a minimumnumber of features from a three dimensional computer model and a single two dimen-sional image. The initial pose is estimated without the use of fiducial markers, without anyrange measurements or any apriori relative motion information. Prior work has been doneto compare different pose estimators for terrestrial applications, but there is a lack offunctional and performance characterization of such algorithms in the context of missionsinvolving rendezvous operations in the space environment. Use of state-of-the-art poseestimation algorithms designed for terrestrial applications is challenging in space due tofactors such as limited on-board processing power, low carrier to noise ratio, and highimage contrasts. This paper focuses on performance characterization of three initial poseestimation algorithms in the context of such missions and suggests improvements.

& 2015 IAA. Published by Elsevier Ltd. All rights reserved.

1. Introduction

Recent advancements have been made to utilizemonocular vision navigation as an enabling technology forformation-flying and on-orbit servicing missions (e.g.,PROBA-3 by ESA [1] , ANGELS by US Air Force [2], PRISMAby OHB Sweden [3]). These missions require approaching apassive space resident object from large distances (e.g., 430 km) in a fuel efficient, safe, and accurate manner.Simple modification of low cost instruments (e.g., startrackers) for high dynamic range can enable accuratenavigation relative to the space resident object. Monocularnavigation on such missions relies on finding an estimateof the initial pose, i.e., the attitude and position of the

ll rights reserved.

a).

space resident object with respect to the camera, based ona minimum number of features from a three dimensionalcomputer model and a single two dimensional image. Foron-orbit servicing missions, this represents the scenariowhere the servicing spacecraft is “lost-in-space”. Estimat-ing the initial pose is especially critical as well as chal-lenging in the design of a pose estimation system as thereis no a priori information about the attitude and positionof the target. Aside from a 3D wire-frame model of thespace resident object, no assumption on the relativetranslational or rotational information is made.

Use of state-of-the-art computer vision techniquesdesigned for terrestrial applications is challenging inspace. For example, use of feature descriptors such as theScale Invariant Feature Transform (SIFT) [4] in pose esti-mation for space imagery is too computationally expensiveand yields poor results (see Fig. 1). We plot SIFT featurematches (in green) between two images taken during the

www.sciencedirect.com/science/journal/00945765

www.elsevier.com/locate/actaastro

http://dx.doi.org/10.1016/j.actaastro.2015.12.032



http://crossmark.crossref.org/dialog/?doi=10.1016/j.actaastro.2015.12.032&domain=pdf



mailto:[email protected]


Fig. 1. Challenges in initial pose estimation using state-of-the-art techniques: pose ambiguity due to geometry (left) and poor feature matching (right).

S. Sharma, S. D'Amico / Acta Astronautica 123 (2016) 435–445436

PRISMA mission. To obtain correct feature matches, SIFTrelies on high image acquisition rates and low image noise,both of which are unavailable in these images. Addition-ally, spacecraft geometry is often highly symmetricalresulting in ambiguity in attitude determination (seeFig. 1). Prior work has been done to compare differentinitial pose estimators [5,6], but there is a lack of func-tional and performance characterization of such algo-rithms in the context of missions involving rendezvousoperations in the space environment.

For the purpose of this comparative assessment, wefocus on a pose estimation architecture (see Fig. 2) basedon points extracted from edge features due to their sim-plicity and accuracy [7]. As opposed to features based oncolor gradients, textures, and optical flow, edges are lesssensitive to illumination changes and can easily distin-guish boundaries of a spacecraft geometry from thebackground in an image.

Source of the 3D model features is a simplified wire-frame model of the space resident object, assumed to beeither stored on-board the servicing spacecraft or formedas a structure-from-motion problem which is solvedalongside pose estimation [8]. Source of the 2D features isan object detection subsystem which processes, extracts,and describes features from a single image captured by theon-board navigation camera. A typical model reductionprocedure and object detection subsystem are describedand illustrated later in the paper.

The 3D model features and 2D features are passed intothe initial pose estimation subsystem which generates apose estimate without knowing the correspondence ofthese features. This is a challenging task without aprioriestimates of the relative motion due to a large search spacefor the correct feature correspondence. However, thesearch space can be reduced using a method such as per-ceptual organization [9,10] which detects viewpoint-invariant feature groupings from the 2D image. These arethen matched to corresponding structures of the 3D modelin a probabilistic manner to create multiple correspon-dence hypotheses. These correspondence hypotheses needto be validated to find the correct one. Hence, for eachcorrespondence hypothesis, n number of 2D and 3D fea-tures are used to calculate a relative pose by solving thePerspective-n-Point Problem (PnP) [11]. The resulting poseestimate is used to create virtual 2D image features byreprojecting the input 3D model features using true per-spective projection. A measure of the reprojection errorbetween the virtual 2D image features and the input 2Dimage features is used in validation. This process is

repeated for all hypotheses in order to identify the correctfeature correspondence and subsequently, a correct poseestimate. Hence, our interest is not in the PnP solvers'statistical use of a large number of measurements to solvean overdetermined systems. Rather, it is in their ingenuityin using a minimal number of points to estimate a coarseinitial pose estimate with a minimal computational effort.

The performance of initial pose estimation hinges onthe solution of the PnP problem. In the above architecture,a PnP solver could be called multiple times and will besubject to a wide variety of input from the object detectionsubsystem and 3D model reduction. Hence, a PnP solvershould not only be fast and efficient but also moreimportantly be reliable and robust to overcome challengesunique to monocular vision-based navigation in space. Inthe remainder of the paper, we first present a formalproblem statement of the PnP problem and then reviewstate-of-the-art PnP solvers. We then introduce our fra-mework of assessment of these solvers where a discussionof simulation input generation, performance criteria, andtest cases is presented. Finally, we present relevant resultsfrom our assessment and conclude with a discussion onapplicability of these solvers in a monocular vision-basednavigation system for on-orbit servicing and formationflying missions.

2. Review of solution methods

Let qi ¼ ½xi yi zi�T , where i¼ 1;2;…;n, be n 3D modelpoints in the object reference framework B. Letpi ¼ ½ui vi 1�T , where i¼ 1;2;…;n, be the corresponding nimage points in the image reference framework P. For aknown camera focal length, f, the PnP problem aims toretrieve the rotation matrix from frame B to the camerareference framework C, RBC, and the translation vectorfrom the origin of frame B to the origin of frame C, TBC:

pi ¼ ½ui vi 1�T ¼αi

γifβi

γif 1

� �T

ri ¼ ½αi βi γi�T ¼ RBCðqiþTBCÞ ð1Þ

Re-writing Eq. (1) using homogeneous coordinates andrepresenting camera parameters as a matrix K, we would liketo estimate the 3� 4 pose matrix, P, whose first three

Fig. 2. Typical pose estimation architecture with inputs of a spacecraft3D model and a 2D image and an output of a pose estimate.

S. Sharma, S. D'Amico / Acta Astronautica 123 (2016) 435–445 437

columns represent RBC and the fourth column represents TBC.

wiui

wiviwi

264

375¼ K

264

375 P

264

375

xiyizi1

26664

37775 ð2Þ

Without loss of generality, we can expand Eq. (2) toobtain Eq. (3), assuming square pixels in the image sensor,zero skewness, and zero distortion:

wiui

wiviwi

264

375¼

f 0 00 f 00 0 1

264

375

RBCð1;1Þ RBCð1;2Þ RBCð1;3Þ TBCð1ÞRBCð2;1Þ RBCð2;2Þ RBCð2;3Þ TBCð2ÞRBCð3;1Þ RBCð3;2Þ RBCð3;3Þ TBCð3Þ

264

375

xiyizi1

26664

37775

ð3ÞEq. (3) has six unknown coefficients as there are six

degrees of freedom in P. Three degrees are required todescribe the relative attitude and three are required todescribe the relative position. Hence, three image points,i.e., n¼3 will provide six measurements which can be usedto solve six equations in six unknowns. However, therewill be four possible solutions as the conversion to non-homogeneous coordinates is non-linear. In fact, it has beenshown that a unique solution with a general configurationof points is only possible with n¼6 [11].

The PnP problem has been systematically investigated inliterature and some of the most commonly used PnP solversare PosIt and Coplanr PosIt [12], EPnP [13], OPnP[14], and Lu-Hager-Mjolsness method [15]. These solvers assume that thecorrespondence between model points and observed imagepoints has already been established. This is not true in ourcase which makes the problem of initial pose estimationespecially challenging. Note that even when a unique solutionis theoretically available for the PnP problem, there is noguarantee that a unique solution actually exists in the case ofsymmetrical spacecraft geometries. For example, a PnP solvercan provide six different solutions for a cube with six identicalsides. Hence, we need to carry along multiple solutions of PnPeven when we have more than six measurements due to thesymmetry. State-of-the-art PnP solvers usually take either aniterative minimization approach or a multi-stage analyticalapproach. The multi-stage analytical approaches estimatecoordinates of some or all points in the camera frame andthen solve the problem in 3D space using linear forms of theperspective Eq. (1). This is done to make computation fast for

large number of measurements at the expense of accuracy forsmall number of measurements. Other solvers that utilizenon-linear constraints posed by Eq. (1) tend to be computa-tionally expensive. The iterative minimization approachesminimize an error function defined in the image or objectspace. They typically take non-linear constraints of Eq. (1) intoaccount but tend to get trapped in local minima. Four of thesestate-of-the-art solvers, namely PosIt, Coplanar PosIt, EPnP,and the Newton–Raphson Method [16,17] are discussed inmore detail in the following sections to provide analyticalreasoning behind their performance in initial pose estimation.PosIt, Coplanar PosIt, and the Newton–Raphson Method takean iterative minimization approach while EPnP takes a multi-stage analytical approach. Note that we exclude our assess-ment of the Lu-Hager-Mjolsness method [15] as it is also aniterative minimization approach which is shown to beapproximately four times slower than the Newton–RaphsonMethod [18].

2.1. PosIt and Coplanar PosIt

PosIt [12] requires a minimum of four non-coplanar 3Dmodel points and corresponding 2D image points for posecalculation. It approximates true perspective projectionwith a Scaled Orthographic Projection (SOP) to form acoarse estimate of the pose which is then iterativelyimproved until convergence to a solution. The algorithmfor PosIt scales both sides of Eq. (3) by 1=TBCð3Þ. Then,expanding the first two rows yields the following rela-tionship, where s¼ f =TBCð3Þ:

xi sRBC 1;1ð Þþyi sRBC 1;2ð Þþzi sRBC 1;3ð ÞþsTBC 1ð Þ ¼ uiwi

TBC ð3Þð4Þ

xi sRBC 2;1ð Þþyi sRBC 2;2ð Þþzi sRBC 2;3ð ÞþsTBC 2ð Þ ¼ viwi

TBC ð3Þð5Þ

PosIt initializes with the SOP assumption, i.e.,wi

TBCð3Þ¼ 1, so that (4) and (5) can be simplified to linear

equations with only eight unknowns, i.e., sRBCð1;1Þ,sRBC ð1;2Þ, sRBC ð1;3Þ, sTBCð1Þ, sRBCð2;1Þ, sRBCð2;2Þ, sRBCð2;3Þ,and sTBCð2Þ. With n ¼ 4, these eight unknowns can besolved using the eight linear equations formed by writing(Eqs. (4) and 5) for i¼ 1;2;3; and 4. Then using the

resulting estimate of the pose matrix, the value ofwi

TBCð3Þcan be updated at the end of each iteration. For eachiteration, the reprojected model points are calculated byusing the current estimate of the pose matrix in Eq. (3). Atthe end of each iteration, an improvement score, ΔE, iscalculated by measuring the Euclidean distance betweenthe reprojected model points of the current and previousiterations.

ΔE¼Xni ¼ 1

fffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiΔu2

i þΔv2i

qð6Þ

Iterations are stopped if either ΔE falls below 0.1 or1000 iterations are reached. At low focal lengths or whenthe space resident object is close to the camera, SOP is not


a valid approximation of true perspective projection andleads to inaccurate solutions. However, we can expect tosee a trend of improvement in PosIt's “performance”(relative to other solvers) as SOP begins to closelyapproximate true perspective projection when the spaceresident object is far from the camera.

Coplanar PosIt [19] is a variation of PosIt which exclu-sively addresses the case of coplanar 3D model points andcorresponding 2D image points. It acknowledges that anSOP approximation leads to two possible solutions of RBCdue to an additional degree of freedom. This is addressedby keeping track of the solution with the lower reprojec-tion error for a given iteration. The reprojection error is theaverage Euclidean distance between the measured imagepoints, pi, and the reprojected model points obtained byusing the estimate pose matrix in Eq. (2). Similar to PosIt,the solutions are iteratively refined until ΔE of one of thebranches falls below 0.1. A branch of the solutions is dis-carded if it estimates any of the 3D model points to bebehind the camera. For a given iteration, if the two pos-sible solutions lie close to each other in the solution space,only refining the solution with the lower reprojectionerror does not guarantee convergence to a solution withthe eventual lower reprojection error. This is a majorshortcoming of Coplanar PosIt and can only be remediedby sacrificing computational efficiency and tracking allpossible solution branches.

2.2. EPnP

Applicable to both coplanar and non-coplanar 3Dmodel points, EPnP [20] attempts to find a closed-formpose solution and requires a minimum of four corre-sponding model and image points. The main idea is toexpress the 3D model points as a weighted sum of fournon-coplanar virtual control points, cj, where j¼ 1;2;3;and 4, in the object frame B (see Fig. 3). Using a matrixinversion, the homogeneous barycentric coordinates, αij,can be computed from the n 3D model points, qi, assumingarbitrary coordinates of the control points.

qi ¼X4j ¼ 1

αijcj ð7Þ

Using Eq. (7), Eq. (3) can be re-written in terms of the12 unknown control point coordinates in the cameraframe, ½α̂j ; β̂j ; γ̂j �T , where j¼ 1;2;3; and 4:

wiui

wiviwi

264

375¼

f 0 00 f 00 0 1

264

375X4

j ¼ 1

αij

α̂j

β̂j

γ̂j

2664

3775 ð8Þ

By substituting the values of wi from the third row intothe first two rows of Eq. (8), two linear equations areformed for each corresponding pair of a 3D model pointand an image point:

X4j ¼ 1

αijf α̂j �αijuiγ̂j ¼ 0 ð9Þ

X4j ¼ 1

fαijβ̂j �αijviγ̂j ¼ 0 ð10Þ

Hence, the 2n linear equations which result fromwriting (Eqs. (9) and 10) for i¼ 1;2;…;n, can be used toestimate the 12 unknowns in the form of a 12�1 vector, x̂,by solving a linear system of the form Mx̂ ¼ 0. Note that Mis a 2n�12 matrix formed by arranging the coefficients ofthe 2n linear equations. When the image points are perfectdata from a true perspective projection of the modelpoints, the solution is calculated as a 12�1 vector, x̂, in thenull-space of the 12�12 matrix, MTM. However, the matrixmay have as many as four linearly dependent columns dueto measurement noise and/or the focal length of thecamera. There could be four possible solutions expressedas linear combination of the eigenvectors of this matrix.Out of the four possible solutions, the one with the lowestreprojection error is selected. Typically, this is inaccuratewhen the focal length is large and the matrix is poorlyconditioned. In this scenario, even a small amount ofmeasurement noise could lead to highly inaccurate esti-mates of the solution.

2.3. Newton–Raphson Method

The Newton–Raphson Method (NRM) [16,17] iterativelysolves the true perspective equations, given by Eq. (1), foran estimate of the pose. It requires an input guess of thepose and a minimum of three corresponding 3D modeland 2D image points. For each iteration, the reprojectionerror is linearized about the current pose estimate,x!¼ ½TBC φBC �T . Note that x! is a 6 � 1 vector containingthe three components of the translation vector, TBC (seeFig. 3), and the three Euler angles, φBC, representing therotation from frame B to frame C:

ERE;i ¼∂pi∂ri

∂ri∂TBC

ΔTBCþ∂pi∂ri

∂ri∂φBC

ΔφBC ð11Þ

The derivatives of the perspective equations withrespect to TBC and φBC are evaluated in the form of a2n�6 Jacobian matrix, J ¼ ½J1; J2;…; Jn�T :

Ji ¼∂pi∂ri

∂ri∂TBC

∂pi∂ri

∂ri∂φBC

h ið12Þ

For each iteration, the least squares solution by usingthe reprojection error from the previous iteration, ERE, andthe Jacobian matrix provides an update to the pose esti-mate, Δ x!. Note that ERE is a 2n�1 vector formed bywriting Eq. (11) for i¼ 1;2;…;n:

Δ x!¼ ðJT JÞ�1JTERE ð13ÞThe pose estimate is refined until either jΔ x!j falls

below 10�10 or 50 iterations are reached. To examinerobustness to initialization with coarse pose estimates, weconducted a simulation with synthetic 3D model and 2Dimage points. For a fixed set of six 3D model points, wevaried the initial guess for RBC 104 times and generated 104

corresponding sets of six image points. Each time theinitial guess for RBC was generated from random values ofthe three Euler angles drawn from the uniform distribu-tions of ½�1801;1801�, ½�1801;1801�, and ½01;1801�. After


computing a pose estimate for each corresponding set ofsix 3D model points and six image points, it was observedthat given the true value of TBC, NRM typically convergedto the correct solution when the initial guess for RBC waswithin 60 1 of the true value of RBC.

3. Framework for comparative assessment

For initial pose estimation, PnP solvers will be calledmultiple times to find the correct correspondencehypothesis. Since there exist multiple ways to generatecorrespondence hypotheses, we decouple their perfor-mance from that of PnP solvers by using a minimal num-ber of matches between the 3D model and the 2D image inthese simulations as inputs. Monte-Carlo simulations are

Fig. 3. Geometric representation of the Perspective-n-Point problem.

Fig. 4. Typical model reduction steps to obtain simplified wire-frame model fRosetta spacecraft (bottom).

used to rigorously inspect performance of the PnP solversin the context of initial pose estimation. It is imperative todefine performance criteria, test cases and generatesimulation input data that exhaustively represent scenar-ios of spaceborne applications. The simulations made useof vectorized implementations of all PnP solvers and werecarried out on a 2.4 GHz Intel Core i5 processor. In order toassess potential implementation of these solvers on astate-of-the-art spaceborne microprocessor running atclock-rates of 30–300 MHz, the profiling results have beenscaled appropriately. Since Coplanar PosIt only acceptscoplanar 3D model points while PosIt only accepts non-coplanar 3D model points, for each simulation we checkcoplanarity of the input and switch to the more suitable ofthe two solvers. Results for this combination of solvers arelabeled as “PosItþ”.

3.1. Simulation input generation

The three inputs of the PnP solvers are n 3D modelpoints extracted from a wire-frame model, cameraintrinsic parameters, and n 2D image points extractedfrom a virtual image of the wire-frame model. For NRM, aninput guess of the pose is also required. We select a ran-dom attitude guess within 7 60° of the true attitude and arandom position guess with a magnitude within 7 30% ofthe magnitude of the true position.

Object 3D Model: The 3D model needs to carry a highnumber of features to reduce ambiguities associated withthe symmetry of spacecraft but also be simplified enough toboost the efficiency of the search algorithms during featurecorrespondence and pose estimation. With this considera-tion, a wire-frame model derived from a high fidelity CADmodel of the Tango spacecraft from the PRISMA mission isused in this paper (see Fig. 4). The model is input inMATLAB as a stereolithographic file fromwhich informationabout surfaces and edges is generated. Duplicate edges as

rom CAD model of Tango spacecraft (top). Output of the same steps for

Fig. 5. Sample simulation input of 3D model points (shown in red). Thediscretized 3D model (shown in blue) has been plotted as a reference.(For interpretation of the references to color in this figure caption, thereader is referred to the web version of this paper.)


well as edges on flat surfaces are removed and theremaining edges are discretized as 3D points. Based on anempirically determined probability distribution, n 3D pointsare selected at random for each simulation and used as aninput to the PnP solvers (see Fig. 5). Although not used inthe comparative assessment of the PnP solvers in our work,we present the results of our model reduction procedure ona publicly available CAD model of the Rosetta spacecraft[21] to show the wide applicability of our method (seeFig. 4). The model reduction procedure can potentially beused for pre-launch validation of pose estimation techni-ques when a CAD model of the target spacecraft is available,or during orbit when the 3D model of the target spaceresident object can be generated using Structure-from-Motion (SfM) techniques.

Camera Model: A virtual pinhole camera model isadopted to model the close-range vision camera embarkedon the servicer spacecraft of the PRISMA mission [16]. Theeffective focal length is 20,187 �10�6 m for both axes of theimage sensor which produces images 752 px � 580 px insize. It is assumed that the images are produced with zeroskewness and zero distortion. Origin of the camera frame Ccoincides with the origin of the image frame P, with theoptical axis aligned with the z-axis. The internal calibra-tion matrix of the camera is as follows:

K ¼2347 0 00 2432 00 0 1

264

375 ð14Þ

Typically, the camera will be calibrated on-ground butit can also be calibrated on-board by treating the intrinsicparameters as five additional unknowns of poseestimation.

Object 2D Image: Simulation input for n image points isobtained through true perspective projection (Eq. (1)) ofthe selected n 3D model points using the virtual camera(see Fig. 6).

3.2. Performance criteria

Four criteria are used to give a quantitative definition toaccuracy, speed, and robustness of the PnP solvers. Thesecriteria are computational runtime of the solver, imageplane error, translation error, and rotation error in the poseestimate output by the solver in comparison to the groundtruth used to generate simulation input.

Computational runtime: MATLAB command tic is usedto start a stopwatch timer when a PnP solver is calledwhereas the command toc is used to stop the timer. Thus,the time elapsed on the timer is reported as the compu-tational runtime to estimate the efficiency of the solversrelative to each other. This runtime is then used to providea coarse first-order approximation of the runtime on aspaceborne microprocessor with a clock-rate of 30 MHz.Since we are interested in implementing a realtime poseestimation architecture on-board a spacecraft micro-processor, it is essential for PnP solvers to consume mini-mal computational resources.

Image Plane error: Also referred to as reprojection error,it is defined as the mean Euclidean distance in pixelsbetween input 2D image points pi and the correspondingvirtual 2D image points p0i constructed from the 3D modelpoints and the pose estimate from the PnP solver

E2D ¼ 1n

Xni ¼ 1

pi�p0ij�� ð15Þ

Translation error: As a measure of accuracy in the esti-mation of relative position, we define translation error asthe percentage difference between the magnitudes of thetrue translation vector and the estimated translation vec-tor

ET ¼j T!truej�j T!est j

j T!truej� 100 ð16Þ

Rotation error: Unit quaternion representation of thetrue rotation matrix, qtrue, and the estimated rotationmatrix, qest, is used to compute the rotation error. We usequaternion algebra to compute a unit quaternion, qdiff,which represents the relative rotation between qtrue andqest. The rotation error is expressed as an equivalent angleand reported in degrees. Note that qdiff ð4Þ represents thescalar component of the unit quaternion:

ER ¼ 2 cos �1ðqdiff ð4ÞÞ ð17Þ

3.3. Test cases

We develop four different test cases to represent therange of possible input to PnP solvers. Our main idea is todecouple the effects of the Earth's shadow, inter-spacecraftdistance, background interaction, and measurement noiseon the performance of the PnP solvers. Different levels ofthe Earth's shadow on the target spacecraft as well as thespacecraft orientation relative to the sun will vary thenumber of features detected through image processing.We simulate this effect by varying the number of featurecorrespondences, n, being passed as an input to the PnPsolvers. Measurement noise due to the image sensor


characteristics is simulated by adding random Gaussiannoise with zero mean and a standard deviation, σP, to thesimulation input of the image points being passed into thePnP solvers. Background interaction may lead to featuresbeing detected not on the target spacecraft. This effect issimulated by changing the percentage of outlier imagepoints, ~p. An outlier image point in the simulations isdefined as an image point with a measurement noisegreater than 5σP.

To simulate these effects, we require values of n, σP, and~p which are representative of images taken duringspaceborne applications of monocular vision. In order tomake these values as representative as possible, we referto the series of images captured by visual navigationcameras of the Orbital Express [22] and PRISMA [23]missions. We performed edge detection followed byHough transform [24] to simulate the output of a state-of-the-art object detection subsystem. Typical results areshown in Fig. 8 with intermediate steps shown in Fig. 7.

Fig. 6. Sample simulation input of 2D image points (shown in red). The3D model (shown in blue) has been plotted as a reference. (For inter-pretation of the references to color in this figure caption, the reader isreferred to the web version of this paper.)

Fig. 7. Intermediate steps of object detection subsystem based on edges: prep

These results are used to empirically determine the rangeof n, σP, and ~p used in the following test cases.

Test Case 1: number of feature correspondences. We areinterested to gauge the performance of PnP solvers with aninput of minimal number of feature correspondences asthese solvers will be used on small sets of input to statis-tically identify outliers and identify the correct featurecorrespondence. As our first test case, we perform simu-lations with varying number of feature correspondencesfrom a minimum of three to a maximum of twelve cor-respondences. For each value of n, 104 simulations are run,each with a different input set of 3D model points. Thetrue pose is kept constant for each simulation in this testcase to negate the effect of the geometry of the spaceresident object. Also note that feature correspondences areperfect, i.e., the input set of the 3D model points is cor-rectly mapped to the corresponding input set of the 2Dimage points.

Test Case 2: pixel location noise. The 2D image points canbe modeled as a sum of the true pixel locations and thepixel location noise. The pixel location noise can be mod-eled as a Gaussian distribution in two dimensions with astandard deviation of σP which is shown to be proportionalto image intensity noise σI for state-of-the-art objectdetection algorithms [25]. Since we can estimate σIthrough a principal component analysis of homogeneousgrayscale image patches [26], we can estimate the expec-ted values of σP in a spaceborne application. Hence, as oursecond test case, we perform 104 simulations with varyingvalues of pixel location noise. Noise is characterized as aGaussian distribution with a mean of zero and a standarddeviation of σP. This noise is added to the 2D image pointsgenerated from the true perspective projection of six 3Dmodel points. The true pose is kept constant and featurecorrespondences are perfect for all simulations in this testcase. For a fair comparison, we limit the number of itera-tions for PosIt and NRM so that computational runtime iscomparable to the closed-form solver EPnP.

Test Case 3: outliers. Feature correspondence of input 3Dmodel points and 2D image points is prone to containoutliers due to errors in the object detection subsystem orsimply due to the geometry of the space resident. An objectdetection subsystem based on edges can output partialedges due to illumination conditions and/or false edges dueto background interaction (see Fig. 8). Such an output fromthe object detection subsystem can result in an incorrectfeature correspondence. Moreover, a probabilistic approach

rocessing (left), feature extraction (middle), feature description (right).

Fig. 8. Typical output of an object detection subsystem overlaid on actual space imagery from the PRISMA[23] mission (left) and Orbital Express[22]mission (right).

Fig. 9. Results for translation error and rotation error from simulations of all test cases: number of feature correspondences (top left), pixel location noise(top right), outliers (bottom left), and distance along optical axis (bottom right).



to feature correspondence is prone to errors even whenobject detection is perfect, for example, when the geometryof the space resident contains indistinguishable or sym-metric surfaces. A robust PnP solver should negate theeffects of incorrect feature correspondences (outliers) pre-sent in its input and produce an accurate pose estimate.Hence, as our third test case, we perform 104 simulationswith varying number of outliers. Outliers are 2D imagepoints with pixel location error of 10 px. This is in additionto the pixel location noise characterized by Gaussian dis-tribution of a mean of zero and a standard deviation of 2 px.This noise is added to the 2D image points generated fromthe true perspective projection of twelve 3D model points.As in Test Case 2, the true pose is kept constant for allsimulations and iterations for NRM and PosIt are limited.

Test Case 4: distance along optical axis. With fixedcamera characteristics, increasing the separation decreasesthe spread in the distribution of the 2D features on theimage plane making it difficult for the object detectionsubsystem to resolve features. Hence, as our fourth testcase, we perform 104 simulations with varying relativeseparation of the space resident object and the camera inthe direction of the optical axis of the camera. Pixel loca-tion noise is added to the 2D image points generated fromthe true perspective projection of six 3D model points.Noise is characterized by a Gaussian distribution with amean of zero and standard deviation of 2 px. The relativeattitude is kept constant and feature correspondences areperfect for all simulations in this test case. As in Test Case2, the number of iterations for NRM and PosIt is limited.

4. Results & Discussion

For each test case, mean and variance of performancecriteria for all simulations are represented as markers/linesand bars/shaded regions, respectively (see Figs. 9 and 10).Variance is reported as the interquartile range, i.e., thedifference between the upper and lower quartile values ofperformance criteria.

4.1. Test Case 1: number of feature correspondences

NRM is the most computationally expensive solver dueto its use of least squares to invert the Jacobian matrix ateach iteration. Typically, least squares in NRM is anoperation of Oðc2nÞ complexity where c is a constant equalto 6, the number of unknowns for the PnP problem. This ishighly expensive in comparison to an Oðc3nÞ overall com-plexity of EPnP and Oð24nÞ overall complexity for PosIt.Note that for 3rnr6, NRM requires fewer and feweriterations to converge as n is increased. This leads to anoverall decrease in computational runtime even thoughcomplexity per iteration increases. For this test case, PosIthas the highest errors in comparison to other solvers but itimproves in accuracy for increasing values n. Even thoughEPnP provides a pose estimate for n¼4, it has the largestvariance in the translation error and the rotation error atthis value of n.

4.2. Test Case 2: pixel location noise

With an increase in pixel location noise in the input 2Dimage points, all solvers exhibit a decrease in accuracy.NRM has the best performance due to its use of leastsquares, which is intentionally well suited to handleGaussian noise. PosIt and EPnP have approximately thesame variance in rotation error as well as translation errorand exhibit a linear increase in errors with an increase innoise. But unlike EPnP which provides a closed-formsolution, errors of PosIt can be reduced at the expense ofcomputational runtime if more iterations are allowed. Aswith the first test case, PosIt's runtime is the lowest andNRM's runtime is the highest.

4.3. Test Case 3: outliers

Recall that an outlier image point was earlier defined asan image point with a random pixel location error ofgreater than 5σP, where σP is fixed at 2 px to characterizethe measurement noise. All solvers exhibit a linearincrease in errors when outliers made up less than 40% ofthe input 2D image points. For higher percentages, errorshold approximately constant values as the performancedegrades to levels where additional outliers have no sig-nificant effect on accuracy. However, the slope of theincrease in errors is proportional to the pixel location errorused to generate the outliers. Comparatively, NRM has thelowest errors while PosIt and EPnP have similar levels oferror. Absolutely, all solvers have sub-optimal performancein the context of initial pose estimation where featurecorrespondences are unknown and input of 2D imagepoints can contain high pixel location errors.

4.4. Test Case 4: distance along optical axis

As the distance along the optical axis is increased, EPnPis the worst affected. One of the intermediate steps in EPnPis the calculation of a basis of the null space of a matrixcontaining all 2D image points. With increasing distancealong the optical axis and apparent shrinkage of the image,this matrix tends to become sparse and null space esti-mation becomes increasingly challenging. However, as thedistance is increased, SOP tends to be a better approx-imation of true perspective projection. Hence, PosIt, whichis based on the SOP approximation has lower rotation andtranslation error than EPnP. However, NRM has the lowesttranslation and rotation error as even at large separations,least squares is well suited to handle noisy input of 2Dimage points.

4.5. Decision matrix

To conclude the comparative assessment, we constructa qualitative decision matrix (see Fig. 11) with PnP solversindicated on the rows and the test cases indicated on thecolumns. Each cell is color coded to represent the solvers’weighted sum of relative performance as measured by thefour performance criteria. Recall that these performancecriteria were earlier defined as the computational runtime,the image plane error, the translation error, and the

Fig. 10. Results for computational runtime and image plane error fromsimulations of test case 1.

Fig. 11. Comparative assessment results for simulations from all testcases as a qualitative decision matrix.


rotation error. We give equal weights to the difference ofeach performance criteria from the mean value of therespective performance criteria, for all test cases and for allPnP solvers. We then use terms “superior”, “par”, and“inferior” to represent this difference of performancecriteria.

We see that the use of PosItþ makes the PosIt solverapplicable to both coplanar and non-coplanar pointswithout having a significant impact on computationalruntime. EPnP has the best performance when pixel loca-tions are free from noise or outliers. NRM has the lowesterrors across all test cases. However, it requires an inputguess of the pose and it also has an order of magnitudehigher computational time across all test cases. In thepresence of feature outliers, all solvers have sub-optimallevels of rotation and translation error.

5. Conclusions & way forward

Preliminary results indicate that the runtime of eachcall of a PnP solver is on the order of 10 ms when

embedded in a spaceborne microprocessor (clock-rate of30 MHz). In a typical scenario, we can expect about 103

calls of a PnP solver which makes their current perfor-mance unsatisfactory for a spaceborne application of real-time pose estimation. Accuracy of the PnP solvers isacceptable only when feature correspondence is perfectand is sub-optimal in the presence of feature outliers.Hence, an iterative statistical approach will be necessary toachieve pose convergence in real-world applicationswhere multiple feature correspondence hypotheses needto be validated. The strength of each PnP solver lies indifferent regimes and calls for a strategy to exploit theidentified synergies. Future work will implement a PnPmode switcher based on the obtained decision matrix toyield a fast and robust solution to the perspective equa-tions under diverse operational scenarios. For example,since EPnP has the lowest runtime, it can be used duringthe first few iterations of pose estimation when largenumber of correspondence hypotheses need to be vali-dated. However, in later iterations when the search spacefor correct feature correspondence has been reduced to afew ambiguous hypotheses, NRM can be used due to itsbetter accuracy in the presence of outliers. If the poseestimate is still ambiguous, the architecture could acquireand process subsequent images and validate each ambig-uous pose estimate by exploiting principles of relativeorbit dynamics and kinematics. The interplay between theinitial pose estimation and the object detection subsystemneeds to be explored further. The gradual inclusion ofmore features (in number and in type) and the data editingprocess need to be studied further. Since we now have ameasure of process errors in a variety of test cases, it has tobe understood how such errors could be properly incor-porated in a filtering scheme. This comparative assessmentnot only provides a framework to review initial poseestimators but its conclusions will also serve as a corner-stone for the design of pose estimators for future missionsinvolving rendezvous and proximity operations.

References

[1] L.T. Castellani, et al., PROBA-3 mission, Int. J. Space Sci. Eng. 1 (4)(2013) 349–366.

[2] A.F.R. Laboratory, Fact sheet: automated navigation and guidanceexperiment for local space (ANGELS) ⟨http://www.kirtland.af.mil/shared/media/document/AFD-131204-039.pdf⟩, July 2014 (accessed30.09.2014).

[3] S.D'Amico, et al., PRISMA, in: M.D'Errico (Ed.), Distributed SpaceMissions for Earth System Monitoring, Springer New York, NewYork, NY, ISBN: 978-1-4614-4540-1, 978-1-4614-4541-8, 2013,pp. 599–637.

[4] D.G. Lowe, Distinctive image features from scale-invariant key-points, Int. J. Comput. Vis. 60 (2) (2004) 91–110.

[5] D. Grest, T. Petersen, V. Kr’´uger, A comparison of iterative 2D-3Dpose estimation methods for real-time applications, in: A.-B. Sal-berg, J. Hardeberg, R. Jenssen (Eds), Image Analysis, vol. 5575, Lec-ture Notes in Computer Science, Springer Berlin Heidelberg, http://dx.doi.org/10.1007/978-3-642-02230-2_72, 2009, pp. 706–715,ISBN: 978-3-642-02229-6.

[6] D.W. Eggert, A. Lorusso, R.B. Fisher, Estimating 3-D rigid bodytransformations: a comparison of four major algorithms, Mach. Vis.Appl. 9 (5–6) (1997) 272–290.

[7] A. Yilmaz, O. Javed, M. Shah, Object tracking: a survey, ACM Comput.Surv. 38 (December (4)), 13es, http://dx.doi.org/10.1145/1177352.1177355, 2006, ISSN: 03600300.

http://refhub.elsevier.com/S0094-5765(15)00471-3/sbref1



http://www.kirtland.af.mil/shared/media/document/AFD-131204-039.pdf

http://www.kirtland.af.mil/shared/media/document/AFD-131204-039.pdf




dx.doi.org/10.1007/978-3-642-02230-2_72

dx.doi.org/10.1007/978-3-642-02230-2_72





dx.doi.org/10.1145/1177352.1177355

dx.doi.org/10.1145/1177352.1177355


[8] D. Nistèr, Preemptive RANSAC for live structure and motion esti-mation, Mach. Vis. Appl. 16 (December (5)), http://dx.doi.org/10.1007/s00138-005-0006-y, 2005, pp. 321–329, ISSN: 0932-8092,1432-1769.

[9] J. Feldman, Perceptual grouping by selection of a logically minimalmodel, Int. J. Comput. Vis. 55 (1) (2003) 5–25.

[10] S. D'Amico, et al., Noncooperative rendezvous using angles-onlyoptical navigation: system design and flight results, J. Guid. ControlDyn. 36 (November (6)), http://dx.doi.org/10.2514/1.59236, 2013,pp. 1576–1595, ISSN: 0731-5090, 1533-3884.

[11] M.A. Fischler, R.C. Bolles, Random sample consensus: a paradigm formodel fitting with applications to image analysis and automatedcartography, Commun. ACM 24 (6) (1981) 381–395.

[12] D.F. Dementhon, L.S. Davis, Model-based object pose in 25 lines ofcode, Int. J. Comput. Vis. 15 (1–2) (1995) 123–141.

[13] Vincent Lepetit, Pascal Fua, Monocular model-based 3D tracking ofrigid objects: A Survey, Found. Trends® Comp. Graph. Visi. 1 (1)(2005) 1–89, http://dx.doi.org/10.1561/0600000001.

[14] Y. Zheng, et al., Revisiting the PnP problem: a fast, general andoptimal solution. in: IEEE International Conference on ComputerVision, http://dx.doi.org/10.1109/ICCV.2013.291, December 2013,pp. 2344–2351, ISBN: 978-1-4799-2840-8.

[15] C.-P. Lu, G.D. Hager, E. Mjolsness, Fast and globally convergent poseestimation from video images, IEEE Trans. Pattern Anal. Mach. Intell.22 (6) (2000) 610–622.

[16] S. D'Amico, M. Benn, J.L. Jørgensen, Pose estimation of an unco-operative spacecraft from actual space imagery, Int. J. Space Sci. Eng.2 (2) (2014) 171–189.

[17] A. Cropp, P. Palmer, C.I. Underwood, Pose estimation and relativeorbit determination of a nearby target microsatellite using passiveimagery (Ph.D. thesis), University of Surrey, 2001.

[18] G. Campa, et al., A comparison of pose estimation algorithms formachine vision based aerial refueling for UAVs, in: 14th Mediterra-nean Conference on Control and Automation, 2006 (MED'06), IEEE,Ancona, Italy, 2006, pp. 1–6.

[19] Oberkampf, Denis, Dementhon, F. Daniel, Davis, S. Larry, Iterativepose estimation using coplanar feature points, Comput. Vis. ImageUnderst. 63 (3) (1996) 495–511.

[20] V. Lepetit, F. Moreno-Noguer, P. Fua, EPnP: an accurate O(n) solutionto the PnP problem, Int. J. Comput. Vis. 81 (February (2)), http://dx.doi.org/10.1007/s11263-008-0152-6, 2009, pp. 155–166, ISSN: 0920-5691, 1573-1405.

[21] Eyes of the Solar System, NASA/JPL-Caltech, Rosetta 3D Model, July2012.

[22] D.A. Whelan, et al., Darpa orbital express program: effecting arevolution in space-based systems, in: International Symposium onOptical Science and Technology. International Society for Optics andPhotonics, 2000, pp. 48–56.

[23] T. Karlsson, et al., PRISMA Mission control: transferring satellitecontrol between organisations, in: SpaceOps 2012 (2012).

[24] R.O. Duda, P.E. Hart, Use of the Hough transformation to detect linesand curves in pictures, Commun. ACM 15 (1) (1972) 11–15.

[25] P. Tissainayagam, D. Suter, Assessing the performance of cornerdetectors for point feature tracking applications, Image Vis. Comput.22 (August (8)), http://dx.doi.org/10.1016/j.imavis.2004.02.001,2004, pp. 663–679, ISSN: 02628856.

[26] X. Liu, M. Tanaka, M. Okutomi, Single-image noise level estimationfor blind denoising, IEEE Trans. Image Process. 22 (December (12))(2013) 5226–5237, http://dx.doi.org/10.1109/TIP.2013.228340. ISSN:1057-7149, 1941-0042.

dx.doi.org/10.1007/s00138-005-0006-y

dx.doi.org/10.1007/s00138-005-0006-y




dx.doi.org/10.2514/1.59236








dx.doi.org/10.1561/0600000001

dx.doi.org/10.1561/0600000001

dx.doi.org/10.1561/0600000001

dx.doi.org/10.1109/ICCV.2013.291













dx.doi.org/10.1007/s11263-008-0152-6

dx.doi.org/10.1007/s11263-008-0152-6




dx.doi.org/10.1016/j.imavis.2004.02.001

http://dx.doi.org/10.1109/TIP.2013.228340



comparative assessment of techniques for initial pose estimation using monocular … ·...

Documents