1 machine vision - california state university,...

9
1 Machine vision Lecture Summary # 11 STEREO VISION The goal of stereo vision is to use two cameras to capture 3D scenes. There are two important problems in stereo vision: Correspondence problem: finding matching pairs (conjugate pairs) of the two images that represent the same point in the 3D scene. Reconstruction problem: obtain the 3D structure from the images. For a single pinhole camera we wrote: u = λx z (1) v = λy z (2) A simple camera geometry for stereo vision is shown in figure 1, from which we have: u r = λ (x - b) z (3) u = λx z (4) v r = λy z (5) v = λy z (6) where λ is the focal length. The distance from the image plane to the center of projection. b is the baseline, distance between the centers of the two cameras . We assume that the optical axes are aligned. By subtraction, we get u - u r = λb z (7) and therefore z = λb u - u r (8) It is common to attach the origin to the left camera as shown in figure 1. We assume that both cameras are calibrated and that they are identical. We also assume that the relative orientation of the two cameras is the same. It is also possible to attach the origin to the middle point between the two cameras reference frames. The equations will be slightly different. Equation (8) gives the distance to the 3D point from the camera. Note that The difference u - u r is called the horizontal disparity, retinal disparity, or binocular disparity. In order to get a feel for the disparity, put one finger in front of you, close one eye, then open it and close the other eye. Distance z is inversely proportional to disparity Disparity is proportional to the base line Accuracy of depth determination increases with increasing baseline. Images become less similar when the baseline increases. For a given baseline, the accuracy is better for closer objects than for farther objects. Example Using equation (8), we can determine the x and y coordinates of point P as follows: x = b u u - u r (9) y = b v r u - u r (10)

Upload: others

Post on 26-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Machine vision - California State University, Sacramentoathena.ecs.csus.edu/~belkhouf/MVsummary11.pdf · Machine vision, spring 2019 FB Fig. 1. Stereo vision geometry, C l is the

1

Machine visionLecture Summary # 11

STEREO VISION

The goal of stereo vision is to use two cameras to capture 3D scenes. There are two important problems in stereo vision:• Correspondence problem: finding matching pairs (conjugate pairs) of the two images that represent the same point in the

3D scene.• Reconstruction problem: obtain the 3D structure from the images.

For a single pinhole camera we wrote:

u =λx

z(1)

v =λy

z(2)

A simple camera geometry for stereo vision is shown in figure 1, from which we have:

ur =λ (x− b)

z(3)

u` =λx

z(4)

vr =λy

z(5)

v` =λy

z(6)

where• λ is the focal length. The distance from the image plane to the center of projection.• b is the baseline, distance between the centers of the two cameras .• We assume that the optical axes are aligned.

By subtraction, we get

u` − ur =λb

z(7)

and thereforez =

λb

u` − ur(8)

It is common to attach the origin to the left camera as shown in figure 1. We assume that both cameras are calibrated and thatthey are identical. We also assume that the relative orientation of the two cameras is the same. It is also possible to attach theorigin to the middle point between the two cameras reference frames. The equations will be slightly different. Equation (8)gives the distance to the 3D point from the camera. Note that

• The difference u` − ur is called the horizontal disparity, retinal disparity, or binocular disparity. In order to get a feel forthe disparity, put one finger in front of you, close one eye, then open it and close the other eye.

• Distance z is inversely proportional to disparity• Disparity is proportional to the base line• Accuracy of depth determination increases with increasing baseline.• Images become less similar when the baseline increases.• For a given baseline, the accuracy is better for closer objects than for farther objects.

Example

Using equation (8), we can determine the x and y coordinates of point P as follows:

x = bu`

u` − ur(9)

y = bvr

u` − ur(10)

Page 2: 1 Machine vision - California State University, Sacramentoathena.ecs.csus.edu/~belkhouf/MVsummary11.pdf · Machine vision, spring 2019 FB Fig. 1. Stereo vision geometry, C l is the

Machine vision, spring 2019 FB

Fig. 1. Stereo vision geometry, Cl is the reference point.

Example

Consider images 2 and 3 obtained from a stereo vision system (in this problem we use subscripts 1 and 2 for right and leftimages, respectively). The image size is 3456 by 4608 pixels. The pixel coordinates of the dot are r1 = 749, r2 = 4271, c1 =420. The origin of the pixel coordinate system is the bottom left of the image. (u0, v0) is located in the middle of the image.

1) Deduce a formula to find the distance z to point P .2) Calculate z when the intrinsic parameters are

λ

sx=3700 (11)

λ

sy=3450 (12)

u0 =2304 (13)v0 =1728 (14)

The distance between the two cameras is 30cm.3) Find the (x, y) coordinates of point P .

The solution is shown as a code below.

% Camera p a r a m e t e r su0= 2304v0= 1728a l p h a v =3450

2

Page 3: 1 Machine vision - California State University, Sacramentoathena.ecs.csus.edu/~belkhouf/MVsummary11.pdf · Machine vision, spring 2019 FB Fig. 1. Stereo vision geometry, C l is the

Machine vision, spring 2019 FB

Point P

Fig. 2. Left image

Point P

Fig. 3. Right image

a l p h a u =3700%R i g h t

r1 =749c1 =420

%L e f tr2 =4271c2 =420b=300z p o i n t =( a l p h a u ∗b ) / ( r2−r1 )y p o i n t = z p o i n t ∗ ( c2−v0 ) / a l p h a vy p o i n t=−y p o i n t %T r a n s f o r m i n g t h e o r i g i nx p o i n t = z p o i n t ∗ ( r2−u0 ) / a l p h a u

The resiul;ts are

z = 315.16mm (15)x = 167.54mm (16)y = 119.48mm (17)

3

Page 4: 1 Machine vision - California State University, Sacramentoathena.ecs.csus.edu/~belkhouf/MVsummary11.pdf · Machine vision, spring 2019 FB Fig. 1. Stereo vision geometry, C l is the

Machine vision, spring 2019 FB

Fig. 4. Relative geometry between two cameras

RELATIVE GEOMETRY BETWEEN TWO CAMERAS

The assumption of perfectly aligned cameras is violated in practice. Also, two identical cameras do not exist. In general,the first step in stereo vision is to determine the relationship between the two cameras. By relationship we mean the relativeorientation and position (cameras are not aligned any more). Consider the geometric representation of figure 4. Let S` =(x`, y`, z`)

T be the position of point P in the left camera coordinate system and Sr = (xr, yr, zr)T be the position of pointP in the right camera coordinate system. It is possible to relate the coordinates by the following equation

Sr = RS` + T (18)

where R is a rotation matrix, it satisfies RTR = I . System (18) can be written as

r11x` + r12y` + r13z` + Tx = xrr21x` + r22y` + r23z` + Ty = yrr31x` + r32y` + r33z` + Tz = zr

(19)

We do not know R or T but we know the the left and right image projections ur, vr, u`, v`. Knowing the focal length, it ispossible to write

u` =λx`z`

(20)

v` =λy`z`

(21)

ur =λxrzr

(22)

vr =λyrzr

(23)

Now z` and zr are regarded as additional unknowns. After substituting x`, y`, xr, yr by their formulae in terms of the focallength and the depth distance, we get

r11u`z`λ

+ r12v`z`λ

+ r13z` + Tx =urzrλ

(24)

r21u`z`λ

+ r22v`z`λ

+ r23z` + Ty =vrzrλ

(25)

r31u`z`λ

+ r32v`z`λ

+ r33z` + Tz = zr (26)

4

Page 5: 1 Machine vision - California State University, Sacramentoathena.ecs.csus.edu/~belkhouf/MVsummary11.pdf · Machine vision, spring 2019 FB Fig. 1. Stereo vision geometry, C l is the

Machine vision, spring 2019 FB

and

r11u` + r12v` + r13λ+ Txλ

z`= ur

zrz`

(27)

r21u` + r22v` + r23λ+ Tyλ

z`= vr

zrz`

(28)

r31u` + r32v` + r33λ+ Tzλ

z`= λ

zrz`

(29)

There are three equations and fourteen unknowns (rij , Tx, Ty, Tz.z`, zr). Each additional point provides three more equations,but at the same time introduces two unknown variables: z`, zr. For one point, we can write the system as

3 equations× 1 point = 12 unkowns + 2 unkowns× 1 point (30)

For N points, we obtain3 equations×N points = 12 unkowns + 2 unkowns×N points (31)

Therefore, we need at least 12 points to solve.

COMPUTING THE DEPTH

If we know the translation and the rotation matrix as well as the image coordinate u`, ur, v`, vr, we can calculate the depthsz` and zr:

[r11

u`λ

+ r12v`λ

+ r13

]z` + Tx = ur

zrλ

(32)[r21

u`λ

+ r22v`λ

+ r23

]z` + Ty = vr

zrλ

(33)[r31

u`λ

+ r32v`λ

+ r33

]z` + Tz = zr (34)

Since we have two unknowns and three equations we can use any two equations to solve for z` and zr. In the particular casewhen the cameras have the same orientation, we have:[u`

λ

]z` + Tx = zr

[urλ

](35)[v`

λ

]z` + Ty = zr

[vrλ

](36)

z` + Tz = zr (37)

EPIPOLAR GEOMETRY AND FUNDAMENTAL MATRIX

Consider the stereo vision geometry of figures 5 and 6. We want to solve the correspondence problem. Point P in the 3Dspace is imaged in the left camera at q` and in the right camera at qr. Rays C`q` and Crqr intersect at point P and they bothlie in the same plane. As a result the image points q`, qr, the space point P , and the camera centers are coplanar, i.e., theybelong to the same plane. The plane defined by these three points (C`, Cr, P ) is called the epipolar plane and is denoted byΠ.

The correspondence problem can be formulated as follows: knowing q`, is what are the coordinates and the constraints onthe location of qr? Point qr lies in the right image plane and at the same time, it lies in the plane Π. The intersection ofthe epipolar plane Π and the image plane forms a line `r. Now the search for qr is reduced to line `r. Line `r is called theepipolar line. Points e`, er are called the epipoles. Figure 6 shows the stereo vision geometry. Points P1, P2, P3 have the sameprojection in the left image plane, but different projections in the right image. The epipolar constraint reduces the problem to1D search.

Example

The images in figure 7 and 8 are taken using a stereo vision system with b = 300mm. The coordinates of the point ofinterest in the pixel coordinate system are (916, 686) in the right image and (97, 701) in the left image. The blue line is theright epipolar line and the red line is the left epipolar line.

THE ESSENTIAL MATRIX

Assume we have canonical camerasA` = Ar = I

5

Page 6: 1 Machine vision - California State University, Sacramentoathena.ecs.csus.edu/~belkhouf/MVsummary11.pdf · Machine vision, spring 2019 FB Fig. 1. Stereo vision geometry, C l is the

Machine vision, spring 2019 FB

Fig. 5. Stereo vision geometry

Fig. 6. Stereo vision geometry

X: 916Y: 685.6

500 1000 1500 2000 2500

200

400

600

800

1000

1200

1400

1600

1800

Fig. 7. Examples of the epipolar lines

6

Page 7: 1 Machine vision - California State University, Sacramentoathena.ecs.csus.edu/~belkhouf/MVsummary11.pdf · Machine vision, spring 2019 FB Fig. 1. Stereo vision geometry, C l is the

Machine vision, spring 2019 FB

X: 97Y: 701.4

500 1000 1500 2000 2500

200

400

600

800

1000

1200

1400

1600

1800

Fig. 8. Examples of the epipolar lines

where A` and Ar are the intrinsic matrixes of the left and right camera, respectively. We define the projection matrixes asfollows

M` =[I 0

](38)

Mr =[R T

](39)

The essential matrix is defines asE = TXR

where TX is the translation vector represented under matrix form

TX =

0 −Tz TyTz 0 −Tx−Ty Tx 0

(40)

THE FUNDAMENTAL MATRIX

The fundamental matrix is a algebraic representation of the epipolar geometry. It represents a mapping between the rightand left image. In general

A` 6= I

andAr 6= I

. The fundamental matrix is given byF = [A−1

r ]T [TX ][R][A−1` ] (41)

The most important property of the fundamental matrix is summarized in the following theorem.

Theorem:The fundamental matrix satisfies the following condition: for any pair of corresponding image points

qTr Fq` = 0 (42)

• qr lies on the epilopar line`r = Fq` (43)

• q` lies on the epilopar line`` = FT qr (44)

Equations (43) and (44) show that the fundamental matrix represents a mapping between a point and a line.The correspondence problem is formulated in terms of matrix F . Solving the correspondence problem means solving for

matrix F , which is a unique 3× 3 matrix of rank 2.

7

Page 8: 1 Machine vision - California State University, Sacramentoathena.ecs.csus.edu/~belkhouf/MVsummary11.pdf · Machine vision, spring 2019 FB Fig. 1. Stereo vision geometry, C l is the

Machine vision, spring 2019 FB

COMPUTING THE FUNDAMENTAL MATRIX: THE EIGHT-POINT ALGORITHM

Equation (41) gives the fundamental matrix in terms of the intrinsic and extrinsic parameters. As mentioned previously, eachpair of points gives a scalar constraint as follows

[qTr ]iF [q`]i = 0 (45)

The eight-point algorithm proposes to use at least eight points to calculate matrix F . Equation (42) can be written as

[ur vr 1

] f11 f12 f13f21 f22 f23f31 f32 f33

u`v`1

= 0 (46)

This is a scalar equation that can be reduced to:

[u`ur u`vr u` v`ur v`vr v` ur vr 1

]

f11f12f13f21f22f23f31f32f33

= 0

At least eight points are needed to solve. If we take N points, we obtain N constraint that can be put under matrix form asfollow

Wf = 0

where W is given by

u`1ur1 u`1vr1 u`1 v`1ur1 v`1vr1 v`1 ur1 vr1 1u`2ur2 u`2vr2 u`2 v`2ur2 v`2vr2 v`2 ur2 vr2 1u`3ur3 u`3vr3 u`3 v`3ur3 v`3vr3 v`3 ur3 vr3 1u`4ur4 u`4vr4 u`4 v`4ur4 v`4vr4 v`4 ur4 vr4 1u`5ur5 u`5vr5 u`5 v`5ur5 v`5vr5 v`5 ur5 vr5 1u`6ur6 u`6vr6 u`6 v`6ur6 v`6vr6 v`6 ur6 vr6 1u`7ur7 u`7vr7 u`7 v`7ur7 v`7vr7 v`7 ur7 vr7 1

......

......

......

......

...u`NurN u`NvrN u`N v`NurN v`NvrN v`N urN vrN 1

(47)

and

f =

f11f12f13f21f22f23f31f32f33

(48)

One possible way to solve is bu using the singular value decomposition method. The solution consists of two steps in general• Step 1: Linear solution: Use the singular value decomposition to obtain a first estimation of matrix F by soling Wf = 0.

This estimation may not satisfy the rank requirement for the fundamental matrix. The following commands can be used:

[U, S , V] = svd (A ) ;f = V ( : , end ) ;F = r e s h a p e ( f , [3 3 ] ) ’ ;

• Step 2: Constraint enforcement: Find the closest approximation to F that has rank 2. Again SVD is used as follows

[U, S , V] = svd ( F ) ;S ( 3 , 3 ) = 0 ;F = U∗S∗V’ ;

8

Page 9: 1 Machine vision - California State University, Sacramentoathena.ecs.csus.edu/~belkhouf/MVsummary11.pdf · Machine vision, spring 2019 FB Fig. 1. Stereo vision geometry, C l is the

Machine vision, spring 2019 FB

Example

We want to find the epipolar lines for the pair of images given in figures 7 8. The desired points are

p1 =(916, 686) (49)p2 =(97, 701) (50)

(51)

For 8 points, matrix W is given by:

W =

86165 635807 907 64410 475278 678 95 701 1135184 833000 952 122262 753375 861 142 875 1153576 951588 972 153102 948651 969 158 979 185975 941200 905 97850 1071200 1030 95 1040 1288706 1076086 1193 216590 807290 895 242 902 1589050 1198890 1386 363800 740440 856 425 865 1872448 1307136 1536 478824 717393 843 568 851 1262080 1228500 1170 233632 1095150 1043 224 1050 1

(52)

The fundamental matrix is given by

f =

−0.0000 −0.0000 0.00180.0000 0.0000 −0.0036−0.0009 0.0022 1.0000

(53)

The epipolar lines are given by

`r =

0.0005−0.00261.7367

`` =

−0.00030.0023−1.3524

(54)

9