retracted: fast and robust stereo vision algorithm for obstacle detection

Corresponding author: Yi-peng Zhou E-mail: [email protected]

Journal of Bionic Engineering 5 (2008) 247–252

Fast and Robust Stereo Vision Algorithm for Obstacle Detection

Yi-peng Zhou Department of Automation, Northwestern Polytechnic University, Xian 710068, P. R. China

Abstract Binocular computer vision is based on bionics, after the calibration through the camera head by double-exposure image

synchronization, access to the calculation of two-dimensional image pixels of the three-dimensional depth information. In this paper, a fast and robust stereo vision algorithm is described to perform in-vehicle obstacles detection and characterization. The stereo algorithm which provides a suitable representation of the geometric content of the road scene is described, and an in-vehicle embedded system is presented. We present the way in which the algorithm is used, and then report experiments on real situations which show that our solution is accurate, reliable and efficient. In particular, both processes are fast, generic, robust to noise and bad conditions, and work even with partial occlusion. Keywords: stereo vision, vehicle dynamics, visibility range, image alignment

Copyright © 2008, Jilin University. Published by Elsevier Limited and Science Press. All rights reserved.

1 Introduction

Detecting the environment of a moving vehicle is a complex and challenging task. One goal of this project is to make progress in road safety with communication between a vehicle and the infrastructure. The vehicle is seen as a sensor, inserted into the traffic, which com-municates measurements to a traffic management re-gional centre. Our objective is to measure the range of visibility with an onboard camera, with the aim of overcoming low visibility conditions due to climatic factors.

Different studies on visibility distance measure-ment exist, among which we can find:

(1) A method using detection of lane markings to estimate the visibility distance by measuring the at-tenuation in contrast of lane markings at different dis-tances in front of the vehicle.

(2) A mono-camera method adapted to fog using Koschmieder’s model to give estimation of the mete-orological visibility distance in daytime foggy weather[1]

(3) A method using stereo vision that is generic and not limited to fog. With stereo vision, a good quality depth map is computed[2]. The visibility distance is the distance to the farthest point of the road surface with a

contrast greater than 5%[3]. The method using stereo vision does not differen-

tiate between geometric and atmospheric visibility dis-tance. Indeed, if there is a curve or a slope where the visibility is reduced due to physical reasons, the visible road surface will be limited by the road geometry. In these cases, the visibility distance calculated will be the geometric one.

The method we designed is as generic as the one based on stereo vision but uses only one camera. In this case, we estimate the distance of the farthest object that is in the road plane with a contrast higher than or equal to 5%. This method takes into account the definition of the meteorological visibility distance and it is composed of three parts:

(1) A pseudo-depth map of the vehicle environment by aligning the road plane in the successive images.

(2) A contrast map. (3) A visibility distance which is obtained by taking

the farthest point (depth map) with a contrast greater than 5% (contrast map).

2 Image processing

With a single camera, it is impossible to get directly the depth in an image. But we can calculate it with a

RETRACTED

Journal of Bionic Engineering (2008) Vol.5 No.3 248 perspective projection of the distance of points on the road. The generic way to determine the road plane is to use successive images. Objects related to the road plane are in the same place from one image to the next, but vertical objects are deformed. In general, successive image alignment is made using classical image proc-essing techniques[4]. These methods consist in matching objects in the two images. In our degraded visibility context, this approach is not well adapted because local contrast is greatly reduced. The originality of our ap-proach is to align images knowing the motion of the camera, which is observed or measured with proprio-ceptive sensors. This is similar to the way that an image in “stabilised” in the brain of a moving observer by in-tegrating what the eye sees with the movement detected by the semicircular canals of the inner ear. 2.1 Image acquisition

In the coordinate system of the camera frame the position of a pixel in the image plane is given by its coordinates. The image optical centre is denoted (u0, v0) in the image frame and considered as the image centre.

The transformation between the vehicle frame (with origin at the centre of gravity of the vehicle) and the camera frame is represented by a vector translation t d X hZ= + (Fig. 1) and a rotation of angle β around the axis Y. We denote T as the translation matrix and R as the rotation matrix. The coordinate change between the image frame and the camera frame can be expressed using a projective matrix Mproj

[5]

0

0proj

0 00 0

1 0 0 0

uv

αα

−⎛ ⎞⎜ ⎟= −⎜ ⎟⎜ ⎟⎝ ⎠

M , (1)

where α represents the ratio of the camera focal length to the size of one pixel. Finally, we obtain the transforma-tion matrix Tr from the vehicle frame to the image frame:

projr =T M RT . (2)

If P is a point with homogeneous coordinates (X, Y, Z, 1) in the vehicle frame, its homogeneous coordinates in the image frame become:

T( , , )r P X Y Z= =p T . (3)

Fig. 1 Position of the camera and vehicle dynamics.

We can now compute the coordinates (u, v) of the

projection of P in the image frame:

0

0

cos ( ) sin ( )cos ( ) sin ( )

cos ( ) sin ( )

x Z h X du uz X d Z hy Yv vz X d Z h

β βαβ β

αβ β

+ − +⎧ = = +⎪ + + +⎪⎨⎪ = = −⎪ + + +⎩

(4)

2.2 Creation of a transformed image

2.2.1 Flat world assumption If we consider I1 and I2 are the images taken at time

t1 and t2, the knowledge of the vehicle dynamics allows us to obtain an estimation of image I2 from image I1. Let I12 be this estimated image and P a point whose projec-tion in the image frame belongs to it. Let us assume that this point belongs to the road plane, meaning that if (X2, Y2, Z2) are the coordinates of this point in the vehicle frame, then Z2 = 0. So the expression of X2 and Y2 is deduced from Eq. (4) as

2

2

cos ( ) sin ( )sin cos

sin cos

dU h hU dXU

hVYU

β α β αα β β

αα β β

+ + −⎧ =⎪ −⎪⎨ −⎪ =⎪ −⎩

, (5)

where U = u − u0 and V = v − v0 . 2.2.2 Vehicle motion

If we know the vehicle motion, we can calculate its movement between time t1 and t2. As soon as we have points in the vehicle frame[6] we can get new points following this movement (see Fig. 2).

RETRACTED

Zhou: Fast and Robust Stereo Vision Algorithm for Obstacle Detection 249

Fig. 2 Vehicle motion.

From the knowledge of the coordinates of a point P

and the vehicle dynamics, we can express the coordi-nates of the point P in the camera frame at time t1

T T, , )12 12 12 2 2( ( , ,0)rx y z X Y=T M , (6)

where M is the vehicle rotation/translation matrix be-tween two instants. We obtain the coordinates (u12, v12) of P in the image frame of I1:

12 1212 12

12 12

, x yu vz z

= = . (7)

2.2.3 Example of transformed image An example of transformed image 12I obtained

from an image I2 is given in Fig. 3. The comparison between image I2 and the estimated image 12I allows us to obtain a depth map in the same way as stereo vision. If we have made the hypothesis that all the points in image I2 belong to the road plane, a short distance allows us to validate the flat world assumption.

Fig. 3 Left is current image; right is transformed image after a 2 m displacement. Result obtained with synthetic images.

2.3 Pseudo-depth map construction

Since we can say that a pixel of coordinates (u, v) belongs to the road plane (see Fig. 4), we can express the distance d of this pixel by[7]

hh 2

0h

if where =

cos if

v v Hv vdv v

λαλβ

⎧ >⎪ −= ⎨⎪ ∞ ≤⎩

, (8)

where H denotes the mounting height of the camera, the ratio between the camera focal length and the size of one pixel, vh the position of the horizon in the image and β0 the camera pitch angle.

Fig. 4 Images taken by the onboard camera. In left, in black: points belonging to the road plane, in white: points not be-longing to the road plane.

2.4 Structure of the image via correlation metrics

2.4.1 Pseudo-disparity computation between two images We have to match both images. This means that we

have to find local correspondences between two neigh- borhoods from each image. These correspondences are computed via the Zero mean Normalized Cross Corre-lation (ZNCC) correlation matrix. To realize this op-eration, we have to select a pixel p1 = (u1, v1) in the image I1 and another pixel p2 = (u2, v2) in the trans-formed image 12I . Then, we define a centred neighbourhood V (p1) around the pixel p1 and V (p2) around the pixel p2 in which we compute the ZNCC correlation metric:

1 2

1 2 1 2

1 2( ), ( )

1 22 2

1 1( ), ( ) ( ), ( )

ZNCC( , )

n

V p V p

n n

V p V p V p V p

I Ip p

I I=

∑

∑ ∑

, (9)

where

1 11 1 1

1 12 2 2

( , )( , )

I I u i v j II I u i v j I

=

=

⎧ + + −⎨

+ + −⎩ . (10)

The closer the correlation matrix is to 1, the more we can consider these two neighbourhoods as identical. Working on a single pair of p1 and p2 limits our study. Indeed, some matching errors can occur and a pixel belonging to the road can be incorrectly matched in the image 12I . That’s why we have to extend our study zone. To do it, we have defined a search window. The corre-lation neighbourhood in image I1 is centred on a point of interest. The correlation neighbourhood in the image 12I

RETRACTED

Journal of Bionic Engineering (2008) Vol.5 No.3 250 is centred successively around a pixel varying in a search frame (this search frame is centred on the pixel p1 of image I1). This principle is schematized in Fig. 5

Fig. 5 Correlation neighbourhood and search window.

As soon as the sweep of the search window is

complete, we keep the position (u2, v2) of the pixel with the best correlation score. With these two positions, we calculate a pairing distance:

2

2 21 1 2| | | |d u u v v= − + − (11)

After that, we kept only those points with a small pairing distance as points belonging to the road plane. 2.4.2. Road or non-road hypothesis

To get a good non-road hypothesis, notice that ob-jects not belonging to the road plane are deformed to-wards the top and the borders of the image. We have defined a search frame on the basis of these deforma-tions. We can have an idea of this deformation in Fig. 3. When the pixel is on the right side of the image, the search frame is deformed towards the top and the right. When the pixel is on the left side, the search frame is deformed towards the top and the left side of the image. This idea comes from stereo vision[8]. This is schema-tized in Fig. 6.

Fig. 6 Correlation with deformed window for the non-road hypothesis.

This deformed window gives us better correlation for objects not belonging to the road plane. Finally, for each pixel we compute a pairing distance with a normal and a deformed searching window. Objects belonging to the road plane have a smaller pairing distance with a normal window. On the contrary, objects not belonging to the road plane have a shorter pairing distance with a deformed window. An example of this result is given in Fig. 4 using actual images in fog. The majority of pixels belonging to the road plane are successfully recognized, as opposed to the pixels belonging to the vertical direc-tion.

3 Vehicle dynamics

In the previous section, we have seen that the six degrees of freedom, three rotations (roll, pitch, yaw) and three translations (longitudinal Tx, lateral Ty and vertical Tz), let us realize the successive image transformations. The sensors that give vehicle dynamic information are an odometer and an Inertial Measurement Unit (IMU). The odometer gives information on the number of rota-tions of the wheel. The IMU gives angular speed of the three rotation axes of the vehicle (roll, pitch, yaw) and accelerations in the three axes of the vehicle (X, Y, Z).

At first sight, the odometer and the IMU should give us knowledge of the six degrees of freedom that we need. Indeed, if we consider that the wheel radius is constant, we can have an estimation of the distance covered by this wheel, meaning by the vehicle. More-over, the angular speed given by the IMU gives an es-timate of the relative angular variation between two time instants. The first question is to know what type of es-timator or observer we should use to estimate the six degrees of freedom. We want to know if the knowledge of any degrees of freedom is really needed for alignment of successive images. This was done with the aim of eliminating some of the degrees of freedom in our process. To do it, we used the notion of sensitivity[7]. If we just look at the nature of the degrees of freedom, we have angles expressed in radians and distances ex-pressed in meters[9]. So we can not directly compare them. The sensitivity allows us to compare the different contributions of the degrees of freedom using simulated scenarios. The follows are results on different simulated

RETRACTED

Zhou: Fast and Robust Stereo Vision Algorithm for Obstacle Detection 251scenarios.

We have designed a prototyping platform with which we can simulate the behaviour of a vehicle and its onboard sensors[10], get their exact motions and see the results of the successive images transformations. We have defined different scenarios to stimulate all the de-grees of freedom and to reproduce some of the standard vehicle behaviours. The initial speed was constant at 30 km/h and the different scenarios are given in the follows.

• Acceleration and braking in a straight line, the acceleration was between −1.5 m·s−2 and 1.5 m·s−2.

• Right and left oscillation at constant speed, on a two lane road, we move the vehicle from one lane to the other.

• Straight line at a constant speed. • Long right turn, at a constant speed, we turn the

wheel to turn a full circle. Table 1 shows the maximum value of the sensitiv-

ity we obtained for all degrees of freedom considered. Table 1 Pixels displacements obtained with the different scenario

Acceleration / braking

Right-left oscillation

Constant speed

Right turn

Tx 20 15 20 20 Ty 0 8 0 20 Tz 3 2 0 1 Pitch 10 4 0 4 Roll 0 3 0 1 Yaw 0 50 0 100

We can see that the pixel displacement J(u,v)[11] is

less sensitive to the translation Tz and the pitch and roll angles, except in the first scenario (acceleration and braking). We can say that, as soon as we are driving at a constant speed, doing a turn or changing lane, the three most important degrees of freedom are the two transla-tions Tx and Ty and the yaw angle. The others can be neglected.

Vehicle dynamic estimation is done with two sen-sors: an odometer and an IMU. The yaw angle is given directly by the IMU and we have to estimate Tx and Ty. The odometer gives the distance L covered between time t1 and t2. When the yaw angle q is not zero, we can consider that the vehicle is moving along an arc of circle with radius R and then use the well known trigonometric

equation:

sin( )cos( )

x

y

T RT R R

φφ

=⎧⎪⎨ = +⎪⎩

LRφ

= (12)

4 Visibility

4.1 Contrast estimation We adapted Köhler’s binarization technique[12] in

order to measure the local contrasts of images. A pair of pixels (x, x1) is said to be separated by s if two conditions are met. First, x1∈N4(x). Second, the condition

min(I(x), I(x1)) < s < max(I(x)),

where I(x1) is respected. Let F(s) be the set of all couples (x, x1) separated by s. With these definitions, for every value of s belonging to [0, 255], F(s) is built. For every couple belonging to F(s), the mean logarithmic contrast associated to F(s) is then:

1

11 2, ( )

1 | ( ) | | ( ) |( ) min( , )# ( ) max( , ( )) max( , ( ))x x F s

s I x s I xC sF s s I x s I x∈

− −= ∑ .

(13)

The best threshold s0 verifies the following condi-tion:

0 arg max ( )s C s= s∈[0, 255]. (14)

It is the threshold which has the best mean contrast along the associated border F(s0). Instead of using this method to render the images as binary data, we use it to measure the contrast locally. The evaluated contrast equals 2C(s0) along the associated border F(s0). 4.2 Visibility estimation

To estimate the visibility distance, we combine the measurement of contrasts higher than 5% with the map of the pixels belonging to the road plane[13]. We locally process the contrast of image points belonging to the road plane by scanning it from top to bottom starting from the horizonal line, looking for a point with a con-trast greater than or equal to 5%. We can see in Fig. 7 on an actual fog image an example of a 5% contrast map and the result of previous images alignment. The visi-bility distance is represented by the horizontal line on the pictures.

RETRACTED

Journal of Bionic Engineering (2008) Vol.5 No.3 252

Fig. 7 5% contrast map (left) and road/non-road image (right).

5 Conclusion

In this paper, a generic method estimating the at-mospheric visibility distance is presented. It detects the farthest picture element belonging to the road plane having a contrast greater than 5% using the camera. To discern points belonging to the road plane from the others, the road plane is aligned in successive images by exploiting the relative motion of the vehicle between two instants in time. In contrast with conventional image processing approaches, this relative motion is obtained thanks to the proprioceptive sensors on the vehicle[14]. To distinguish the dominating degrees of freedom in the image transformations, a sensitivity study is carried out using typical driving scenarios.. We found that three degrees of freedom (lateral and longitudinal displace-ments and yaw angle) are enough in our context. Using this assumption, sample results of visibility estimation are given using actual images of fog.

References [1] Hattori H, Maki A. Stereo without depth search and metric

calibration. Conference on Computer Vision and Pattern Recognition, Kawasaki, Japan, 2000, 177–184.

[2] Hautiere N, Labayrade R, Aubert D. Real-time disparity contrast combination for onboard estimation of the visibility distance. IEEE Transactions on Intelligent Transportation Systems, 2006, 7, 201–212.

[3] Labayrade R, Aubert D, Tarel J P. Real time obstacle detection on non flat road geometry through v-disparity representation. Proceedings of IEEE Intelligent Vehicles

Symposium, Versailles, France, 2002, 2, 646–651. [4] Labayrade R, Aubert D. A single framework for vehicle

roll,pitch, yaw estimation and obstacles detection by ste-reovision. Proceedings of IEEE Intelligent Vehicles Sympo-sium, Columbus, USA, 2003, 31–36.

[5] Gruyer D, Berge-Cherfaoui V. Multi-objects association in perception of dynamical situation. Proceedings of the 15th Conference in Uncertainty in Artificial Intelligence, Stock-holm, Sweden, 1999, 255–262.

[6] Labayrade R, Aubert D. In-Vehicle obstacle detection and characterization by stereovision. The 1st International Workshop on In-Vehicle Cognitive Computer Vision Systems, Graz, Austria, 2003, 13–19.

[7] Williamson T A. A High-Performance Stereo Vision System for Obstacle Detection. PhD Thesis, Carnegie Mellon Uni-versity, USA, 1998.

[8] Franke U, Joos A. Real-time stereo vision for urban traffic scene understanding. Proceedings of the IEEE Intelligent Vehicles Symposium, Dearborn, USA, 2000, 273–278.

[9] Goldbeck J, Huertgen B. Lane detection and tracking by video sensors. Proceedings of International Conference on Intelligent Transportation Systems, Tokyo, 1999, 74–79.

[10] Hancock J. High-Speed Obstacle Detection for Automated Highway Applications. Technical Report CMU-RI-TR- 97-17, Robotics Institute, Carnegie Mellon University, 2000.

[11] Rojas J C, Crisman J D. Vehicle detection in color images. Proceedings of the IEEE Conference on Intelligent Trans-portation Systems, Boston, USA, 1997, 403–408.

[12] Grimson W E L. Computational experiments with a feature based stereo algorithm. IEEE Transactions on Pattern and Machine Intelligence, 1985, 7, 17–34.

[13] Kalinke T, Tzomakas C, Seelen W V. A Texture-based object detection and an adaptive model-based classification. Proceedings of the IEEE Intelligent Vehicles Symposium, Stuttgart, Germany, 1998, 1, 143–148.

[14] Bertozzi M, Broggi A, Fascioli A, Nichele S. Stereo vi-sion-based vehicle detection. Proceedings of the IEEE In-telligent Vehicles Symposium, Dearborn, USA, 2000, 39–44.

RETRACTED

retracted: fast and robust stereo vision algorithm for obstacle detection

Documents