joint estimation of epipolar geometry and rectiﬁcation...

Joint Estimation of Epipolar Geometry and Rectification Parameters using PointCorrespondences for Stereoscopic TV Sequences

Frederik Zilly, Marcus Muller, Peter Eisert, Peter Kauff

Fraunhofer Institute for Telecommunications, Heinrich-Hertz-InstitutEinsteinufer 37, 10587 Berlin, Germany

{frederik.zilly,marcus.mueller,peter.eisert,peter.kauff}@hhi.fraunhofer.de

Abstract

An optimal stereo sequence needs to be rectified in or-der to avoid vertical disparities and similar image distor-tions. However, due to imperfect stereo rigs, vertical dis-parities occur mainly due to a mechanical misalignment ofthe cameras. Several rectification methods are known, mostof them are based on a strong calibration. However, cali-bration data is often not provided, such that the rectificationneeds to be done based on point correspondences. In thispaper, we propose a rectification technique which estimatesthe fundamental matrix jointly with the appropriate rectifi-cation parameters. The algorithm is designed for narrowbaseline stereo rigs. The optical axes are almost parallel(beside a possible convergence angle). The rectification pa-rameters allow a pose estimation of one camera relative tothe other one so that the mechanical alignment of the stereorig can be improved.

1. IntroductionWhen producing content for 3D cinema or 3DTV, one

goal is a perfectly aligned pair of stereo sequences. In fact,any misalignment of the cameras leads to vertical dispar-ities. These vertical disparities in stereo pairs lead to eyestrain and visual fatigue [12]. Every stereo rig containsparts of finite mechanical accuracy. Moreover, thermal di-lation changes the extrinsic parameters. When changing thelens’ focus, the internal parameters are affected, possiblythe focal length. In addition, lenses are changed duringshootings, and the setup time is limited. When using zoomlenses, the principal point shifts, the focal length is changedover a wide range of values. The motors for zoom leveland focus do not synchronize exactly in the general case,so that slightly different focal lengths will occur. Finally,these motors suffer from backlash which can be thought asa hysteresis curve which affects the zoom level. As a conse-

quence, a rectification algorithm is needed which performsreliably and which uses only point correspondences. Theresulting image pair should be suitable for watching. There-fore, the rectification method needs to minimize a possibledistortion impact. In addition, the convergence plane shouldnot be changed because this is a critical stereo parameterfor 3D sensation. As a result, the rectification is not donewith respect to the plane at infinity [3] but to a scene depen-dent plane. The proposed method establishes a relationshipbetween the components of the fundamental matrix, and aphysical model of the camera positions. This allows to cal-culate rectifying homographies with a very small distortionimpact. The model assumes a geometry which is near therectified state such that the fundamental matrix can be ex-pressed as linearized Taylor expansion around the rectifiedstate.

2. Rectification

Image rectification is well known in literature. Faugerasdescribes rectification as a reprojection of the left and theright image onto a common image plane R [2]. By rectifi-cation he aims to insure a simple epipolar geometry, i.e. theepipoles are at infinity and the epipolar lines match with theimage scan lines which facilitates dense stereo matching.The new image plane R needs to parallel to the baseline.However, there are two degrees of freedom to choose sucha plane. While one of the degree of freedom only affectsa possible scaling of the images, the other parameter is re-sponsible for image distortion effects. Faugeras proposesto choose the plane R such that it is parallel to the line ofintersection of the two original image planes. Zhang et al.propose an algorithm which combines the estimation of theepipolar geometry with a guided point matching [14]. Pa-padimitriou and Dennis describe a vertical registration algo-rithm [11]. Hartley proposes a rectification algorithm whichis suitable for wide baseline systems [5]. A main idea ofthis algorithm is to minimize horizontal disparities in order

to facilitate image matching. Furthermore Hartley claimsthat the image center undergoes a rigid transformation, i.e.only rotation and translation are applied to the image center.Loop and Zhang propose a rectification algorithm which re-duces image distortions by decomposing the rectifying ho-mographies into one a similarity transform, followed by ashearing transform [8]. Isgro and Trucco propose to calcu-late rectifying homographies without explicit knowledge ofthe epipolar geometry [7]. Fusiello et al. propose a linearrectification algorithm based on two perspective projectionmatrices [4]. Wu and Yu propose to minimize distortion byusing a properly chosen shearing transform [13]. Mallonand Whelan compare different rectification algorithms withrespect to their image distortion impact [10]. They deriverectifying homographies from a fundamental matrix whichmight be affected by noise.

The aim of all rectification processes is to find two ho-mographiesH andH ′ which, applied to two projection ma-trices P and P ′ result in two new matrices Prect and P ′rectwhich have the following properties: Both image planes areparallel, both epipoles are mapped to infinity (1, 0, 0). Theresult is then that epipolar lines are parallel and coincidewith the image scan lines.

2.1. Fundamental Matrix

To establish the point correspondences, which are usedfor the rectification process, a robust feature detector isneeded which produces as few outliers as possible. SuitableFeature detectors are SIFT [9] combined with Difference ofGaussian interest point detection and Up-Right-SURF [1]and the Hessian Box-Filter detector. However, even thesevery distinctive descriptors will produce a certain amountof outliers. One well known technique is to eliminate out-liers using a RANSAC estimation of the fundamental ma-trix [6]. We will develop a constrained fundamental matrix.The matrix is composed by a set of meaningful parameters(angles, offsets in pixel, difference in focal length). Thesesame parameters will later be used to compose the rectify-ing homographies. In addition the parameters can be usedto optimize the mechanical alignment of the stereo rig.

Given a set of point correspondences m and m′ fromthe left and right camera, they need to fulfill the followingequation:

m′TFm = 0 (1)

with m = (u, v, 1) and m′ = (u′, v′, 1). The initial projec-tion matrices have the form P = K[I|0] and P ′ = K ′[R|t].Following this approach we can describe the initial funda-mental matrix F as

F = K′−T [t]×RK−1 (2)

witht = −RC (3)

where C is the camera center of the right camera. Notethat the two inner matrices form the Essential matrix E asin [2]. In the rectified state, we have K′ = K, R = Iand t = (1, 0, 0)T . It is easy to show that in this case, thefundamental matrix has the form:

F =

0 0 00 0 −10 1 0

(4)

3. MethodOur aim is to develop a Taylor expansion of the funda-

mental matrix around the rectified state. In order to linearizethe algorithm, we cut the Taylor expansion after the firstterm. The translation vector t = (tx, ty, tz) can be calcu-lated up to a scale. In our approach we divide t by tx whichdoes not affect (1) and denote t = (1, ty, tz)

T . As we as-sume our camera setup to be near the rectified state, we canconclude that ty � 1 and tz � 1. We get the followingmatrix for the translation vector:

[t]× =

0 −tz tytz 0 −1−ty 1 0

(5)

We assume that the rotation angles are small, and thatany second order effects can be neglected (α � 1). Thisgives us the following approximation for the rotation ma-trix:

R =

1 −αz αyαz 1 −αx−αy αx 1

(6)

We multiply [t]× by R and eliminate any mixed term(αxty = 0, ...) as second order effect. The resulting Essen-tial matrix E is:

E =

0 −tz tytz + αy −αx −1−ty + αz 1 −αx

(7)

Concerning the calibration matrices we assume that theprincipal point is centered, that the aspect ratio is 1 andthat we have zero skew. The focal lengths f and f ′ candiffer. We want to calculate their ratio with high accu-racy because this is important in order to avoid vertical dis-parities induced by a ∆-Zoom. We assume that the ratiof ′/f = 1 + αf where αf � 1. We put the origin inthe image center. We get the following matrices K−1 andK′−T respectively where we approximated 1/(1+αf ) with1− αf .

K−1 =

1/f 0 00 1/f 00 0 1

(8)

2

K′−T =

1−αf

f 0 0

01−αf

f 0

0 0 1

(9)

We can now calculate the linearized fundamental ma-trix F where we eliminated second order effects (αf ty =0, αfαx = 0...) and multiplied by f:

F =

0 −tzf ty

tz+αy

f−αx

f −1 + αf−ty + αz 1 −fαx

(10)

We have developed the fundamental matrix with respect toequation (2). We want now substitute t according to (3) andget ty = cy + αz and tz = cz − αy .

F =

0−cz+αy

f cy + αzczf

−αx

f −1 + αf−cy 1 −fαx

(11)

We insert F into (1)

u′(v−cz + αy

f+ cy + αz

)+v′

(u(czf

) + v−αxf− 1 + αf

)+ (u(−cy) + v − fαx) = 0

We regroup the equation by terms inducing vertical dis-parities v′ − v and have (with u′ − u = ∆u).

v′ − v︸︷︷︸vert. disparity

= cy∆u︸︷︷︸y-shift

+αzu′︸︷︷︸

roll

+ αfv′︸︷︷︸

∆-zoom

−fαx︸︷︷︸tilt-offset in pel.

+αyu′v

f︸︷︷︸αy-keystone

−αxvv′

f︸︷︷︸tilt ind. keystone

+czuv′ − u′v

f︸︷︷︸z-parallax deformation

3.1. Estimating the Fundamental matrix

We are now able to build up a system of linear equationswhich enables us to calculate the coefficients which we needto compose the fundamental matrix. The complete systemhas the following form:

Ax = b

x = (ATA)−1ATb.

The vector b contains the vertical disparities between m′

and m which are minimized. The result vector x contains

the coefficients which can be used to compose the funda-mental matrix and the rectifying homographies:

b = m2′ −m2 (12)

x = (−fαx, αz, αf , cy,αyf,−αx

f,czf

)T (13)

A consists of i rows Ai:

Ai =(

1, u, v′, u′ − u, u′v, vv′, uv′ − u′v)

(14)

with mi = (u, v, 1)T and m′i

= (u′, v′, 1)T

3.1.1 Model fitting with RANSAC

As one can see, the estimation of cz depends on four coor-dinates which makes this estimation numerically unstable.Furthermore, f can be deduced by the two tilt coefficients.When no tilt is present, then this estimation is numericallyunstable. In order to use RANSAC, it might be a goodchoice to omit the estimation of cz and possibly the esti-mation of −αx

f . The latter parameter might be neglectedwhen the vertical opening angle is small (as for long shots).Indeed this reduces the number of point correspondencesfor one guess (sample size) from 7 to 5 (without −αx

f ).Furthermore, any pre-knowledge can be exploited. If oneknows that αf = 0, this coefficient can be omitted as well.With the same argument, one might omit the estimation ofthe toe-in αy if, for instance, the image pair is already de-keystoned. This linearized approach gives us fine granularcontrol of the estimation performance and allows us an in-sight of the sources of numerical unstableness. The samplesize plays an important role in the number of needed sam-ples for the RANSAC, especially when the percentage ofoutliers is high.

The minimum number of samples is

N = log(1− p)/log(1− (1− ε)s). (15)

The following table illustrates this [6] for different sam-ple size and an assumed proportion of outliers ε = 50% andp = 99.9 %. We require a high probability p because wewant to do this rectification step a great number of timeswithin a stereoscopic image sequence.

sample size 3 4 5 6 7required samples 52 108 218 439 881

Table 1. Required samples for RANSAC

Finally, the used distance function for fitting the F-matrixis the Sampson distance:

∑i

(m′Ti Fmi)

2

(Fmi)21 + (Fmi)2

2 + (FTm′i)21 + (FTm′i)

22

(16)

3

3.1.2 Singularity constraint

The fundamental matrix has rank 2 and hence the determi-nant should be zero. If our assumption of vanishing secondorder effects is correct (which of course is the case only upto a certain precision), then the equation ((11)) should leadto a vanishing determinant. However, the numerical valuewill be non-zero and can be interpreted as an indicator, howwell our model of the linearization works. The singular-ity constraint will not be enforced using the SVD methodas described in [6]. In fact, the direct relationship betweenthe components of the fundamental matrix and their physi-cal interpretation as described by (11) would be lost whenenforcing the singularity constraint.

3.2. Rectifying Homographies

Once we have roll, tilt, y-shift and ∆-Zoom we can cal-culate the rectifying homographies directly. Roll and tiltand convergence angle can be corrected by rotating P ′ inthe inverse direction. cy can be corrected by rotating bothcameras around the z-axis (in the same direction by anamount cy). cz can be corrected by rotating both camerasaround the y-axis (in the same direction by an amount cz).The offset in Zoom-Level of P ′ can be corrected with thefollowing homography:

H′∆−Zoom =

1− αf 0 00 1− αf 00 0 1

(17)

The rectifying homographies have the form:

H = KRTK−1 (18)

H =

1 cy fcz−cy 1 0−cz/f 0 1

(19)

H′ =

1− αf αz + cy 0−(αz + cy) 1− αf −fαx

αy−czf −αx

f 1

(20)

In order to do a rectification with respect to the plane at in-finity, the upper-right entry of H ′ should be −f(αy − cz).However, this entry results only in an horizontal offset,which is not wanted in our case. We prefer to preserve theconvergence plane.

4. ResultsIn a first experiment, we applied our method to the

dataset supplied by Mallon and Whelan [10]. 1. Six image1available from http://elm.eeng.dcu.ie/ vsl/vsgcode.html

pairs were rectified using the point correspondences pro-vided within the dataset. The point correspondences wereinserted into the system of linear equations according to 14.To ensure that the results can be compared with [10], allpoint correspondences were used, without a prior RANSACfiltering. Subsequently, the result vector x defined in (13)which contains the coefficients describing the epipolar ge-ometry was computed. The best performance was achievedwhen fitting for the following six coefficients: y-shift, roll,∆-zoom, tilt-offset, αy-keystone, and tilt-induced keystone.These coefficients were used to build the homographies Hand H ′ according to (19) and (20). The resulting homogra-phies were used to rectify the image pairs. The original andthe rectified image pairs are shown in figure 1. In order toperform a quantitative comparision of rectification results,a set of error metric parameters were computed following[10]. For each homography, measures for orthogonality Eo(ideally 90), aspect ratio Ea (ideally 1.0), and rectificationerror Er were computed. The values obtained with the pro-posed method are shown together with the data provided by[10] in table 4. The results regarding Mallon’s, Loop’s, andHartley’s method were transfered from [10].

Sample Method Eo Ea ErrorEr

H’ H H’ H Mean std

Arch

Proposed 89.96 90.00 0.9988 1.0000 0.14 0.36Mallon 91.22 90.26 1.0175 1.0045 0.22 0.33Loop 95.40 98.94 1.0991 1.1662 131.3 20.63

Hartley 100.74 93.05 1.2077 1.0546 39.21 13.85

Drive


Hartley 107.66 90.87 1.3491 1.015 3.57 3.43

Boxes


Hartley 86.56 94.99 0.9412 1.0846 33.36 8.65

Roof


Hartley 122.77 80.89 1.5256 0.8552 11.89 18.15

Slates


Hartley 89.96 88.54 1.0000 0.9769 2.27 5.18

Yard


Hartley 101.95 91.91 1.2303 1.0335 48.19 11.49

Table 4 shows that for our method, the image distortionsmeasured by Eo and Ea are considerably smaller than forany other method. The homography H has always orthog-onality Eo = 90 and aspect ratio Ea = 1.0. As we didnot fit for cz , H results in a 2D rotation around the imagecenter, which does not induce any shearing or anisotropicscaling. The values for H ′ indicate a very low image dis-tortion. Concerning the rectification error Er, our methodshows good alignment performance. The mean of Er isnearer to 0 for every image pair. The standard deviationshows an accuracy similar to Mallon’s method.

4

4.1. Rectification including F-matrix estimation

In a second experiment, we selected one frame of theBeergarden stereo sequence. We used SIFT to find puta-tive matches [9]. Subsequently, we used RANSAC to elim-inate outliers. To perform one RANSAC iteration, we used4 point correspondences to fill the constraints matrix A (14)and to fit the results vector x (13) including y-shift, roll, ∆-zoom and tilt-offset . Afterwards, these values were usedto compute the candidate fundamental matrix F accordingto (12). Figure 2 shows the inlying matches for the left andthe right image before and after rectification. In figure 3the original images and the rectified images are overlayedwhich allows qualitive evaluation of the rectfication perfor-mance. The room divider in the background allows for agood inspection of the alignment of the two cameras. Ap-parently, the rectification process resultet in a well alignedimage pair.

5. Conclusion

We have proposed a rectification technique for whichallows to compute rectifying homographies as well as thecomputation of a fundamental matrix. The latter is im-portant to use the algorithm during a RANSAC elimina-tion of outliers. The algorithm uses point correspondencesand does not need prior knowlegde of the projection ma-trices. The technique involves a linearized computation ofthe epipolar geometry which makes it suitable for setupswhich are near the rectfied state. The image distortions haveshown to be neglectable compared to techniques proposedin [8], [5], and [10]. The algorithm preserves the conver-gence plane of the stereo setup and is suitable for rectifying3DTV stereo sequences [12].

References[1] H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded

up robust features. In Computer Vision ECCV 2006, Lec-ture Notes in Computer Science, chapter 32, pages 404–417.2006. 2

[2] O. Faugeras. Three-Dimensional Computer Vision (ArtificialIntelligence). The MIT Press, November 1993. 1, 2

[3] A. Fusiello and L. Irsara. Quasi-euclidean uncalibratedepipolar rectification. In ICPR08, pages 1–4, 2008. 1

[4] A. Fusiello, E. Trucco, and A. Verri. A compact algorithmfor rectification of stereo pairs. Machine Vision and Applica-tions, 12(1):16–22, 2000. 2

[5] R. I. Hartley. Theory and practice of projective rectifica-tion. International Journal of Computer Vision, 35(2):115–127, November 1999. 1, 5

[6] R. I. Hartley and A. Zisserman. Multiple View Geometryin Computer Vision. Cambridge University Press, ISBN:0521540518, second edition, 2004. 2, 3, 4

[7] F. Isgro and E. Trucco. Projective rectification without epipo-lar geometry. In CVPR ’99, pages 1094–1099. IEEE Com-puter Society, 1999. 2

[8] C. Loop and Z. Zhang. Computing rectifying homographiesfor stereo vision. In Computer Vision and Pattern Recog-nition, 1999. IEEE Computer Society Conference on., vol-ume 1, page 131 Vol. 1, 1999. 2, 5

[9] D. G. Lowe. Distinctive image features from scale-invariantkeypoints. International Journal of Computer Vision,60(2):91–110, November 2004. 2, 5

[10] J. Mallon and P. F. Whelan. Projective rectification fromthe fundamental matrix. Image and Vision Computing,23(7):643–650, July 2005. 2, 4, 5

[11] D. Papadimitriou and T. Dennis. Epipolar line estimation andrectification for stereo image pairs. Image Processing, IEEETransactions on, 5(4):672–676, Apr 1996. 1

[12] A. Woods, T. Docherty, and R. Koch. Image distortions instereoscopic video systems. Proc. SPIE, 1915:36–48, Febru-ary 1993. 1, 5

[13] H.-H. Wu and Y.-H. Yu. Projective rectification with reducedgeometric distortion for stereo vision and stereoscopic video.Journal of Intelligent and Robotic Systems, 42:71–94(24),January 2005. 2

[14] Z. Zhang, R. Deriche, O. D. Faugeras, and Q. T. Luong.A robust technique for matching two uncalibrated imagesthrough the recovery of the unknown epipolar geometry. Ar-tificial Intelligence, 78(1-2):87–119, 1995. 1

5

(a) Arch

(b) Drive

(c) Boxes

(d) Roof

(e) Slates

(f) YardFigure 1. From left to right: original left, original right, rectified left, rectified right

6

Figure 2. From left to right and top to bottom: original left, original right, rectified left, rectified right

Figure 3. A stereo pair from the Beergarden Sequence. Overlay of the two original images (top) and the rectified images (bottom).

7

joint estimation of epipolar geometry and rectiﬁcation...

Documents