127 experiments in robotics for intelligent road …127 experiments in robotics for intelligent road...

10
127 Experiments in Robotics for Intelligent Road Vehicles Massimo Bertozzi, Alberto Broggi, Alessandra Fascioli, Amos Tibaldi Dipartimento di Ingegneria dell’Informazione, Universit` a di Parma Parco Area delle Scienze, 181A - I-43100 Parma, Italy {bertozzi,broggi,fascal,tibaldi}@ce.unipr.it Abstract— This work presents the experience of the ARGO Project. It started in 1996 at the University of Parma, based on the previous experience within the European PROMETHEUS Project. In 1997 the ARGO prototype vehicle was set up with sen- sors and actuators, and the first version of the GOLD software sys- tem –able to locate one lane marking and generic obstacles on the vehicle’s path– was installed. In June 1998 the vehicle underwent a major test (the MilleMiglia in Automatico, a 2000 km tour on Ital- ian highways) in order to test the complete equipment. The analy- sis of this test allowed to improve the system. Tha paper presents the current implementation of the GOLD system, featured by en- hanced Lane Detection abilities and extended Obstacle Detection abilities, such as the detection of leading vehicles and pedestrians. Moreover it is described how this technology was transferred to the automatic driving of snowcats in extreme environments. I. I NTRODUCTION: THE ARGO PROJECT The main target of the ARGO Project is the development of an active safety system with the ability to act also as an auto- matic pilot for a standard road vehicle. In order to achieve autonomous driving capabilities on the existing road network with no need for specific infrastructures, a robust perception of the environment is essential. Although very efficient in some fields of application, active sensors – besides polluting the environment– feature some specific prob- lems in automotive applications due to inter-vehicle interfer- ence amongst the same type of sensors, and due to the wide variation in reflection ratios caused by many different reasons, such as obstacles’ shape or material. Moreover, the maximum signal level must comply with safety rules and must be lower than a safety threshold. For this reason in the implementation of the ARGO vehicle only the use of passive sensors, namely cameras, has been considered. A second design choice was to keep the system costs low. These costs include both production costs (which must be mini- mized to allow a widespread use of these devices) and operative costs, which must not exceed a certain threshold in order not to interfere with the vehicle performance. Therefore low cost de- vices have been preferred, both for the image acquisition and the processing: the prototype installed on ARGO is based on cheap cameras and a commercial PC. The following section present the main functionalities inte- grated on the ARGO vehicle: This research has been partially supported by the Italian National Research Council (CNR) under the frame of Progetto Finalizzato Trasporti 2 and Progetto Madess 2, and by ENEA within the PRASSI and RAS Projects Lane Detection and Tracking Obstacle Detection Vehicle Detection and Tracking Pedestrian Detection. II. THE GOLD SYSTEM GOLD is the acronym used to refer to the software that provides ARGO with autonomous capabilities. It stands for Generic Obstacles and Lane Detection since these were the two functionalities originally developed. Currently it integrates two other functionalities: Vehicle Detection and Pedestrian Detec- tion; other functionalities are currently under development. A. The Inverse Perspective Mapping The Lane Detection and Obstacle detection functionalities share the same underlying approach: the removal of the per- spective effect obtained through the Inverse Perspective Map- ping (IPM) [1], [2]. The IPM is a well-established technique that allows to re- move the perspective effect when the acquisition parameters (camera position, orientation, optics,...) are completely known and when a knowledge about the road is given, such as a flat road hypothesis. The procedure aimed at removing the perspec- tive effect resamples the incoming image, remapping each pixel toward a different position and producing a new 2-dimensional array of pixels. The so-obtained remapped image represents a top view of the road region in front of the vehicle, as it were observed from a significant height. Figures 1.a and 1.b show an image acquired by ARGO’s vision system and the correspond- ing remapped image. B. Lane Detection Lane Detection functionality is divided in two parts: a lower level part, which, starting from iconic representations of the in- coming images produces new transformed representations us- ing the same data structure (array of pixels), and a higher level one, which analyzes the outcome of the preceding step and pro- duces a symbolic representation of the scene. 1) Low- and Medium-level Processing for Lane Detection: Lane Detection is performed assuming that a road marking in the remapped image is represented by a quasi-vertical bright line of constant width on a darker background (the road). Thus,

Upload: others

Post on 16-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 127 Experiments in Robotics for Intelligent Road …127 Experiments in Robotics for Intelligent Road Vehicles Massimo Bertozzi, Alberto Broggi, Alessandra Fascioli, Amos Tibaldi Dipartimento

127

Experiments in Roboticsfor Intelligent Road Vehicles

Massimo Bertozzi, Alberto Broggi, Alessandra Fascioli, Amos Tibaldi

Dipartimento di Ingegneria dell’Informazione, Universita di ParmaParco Area delle Scienze, 181A - I-43100 Parma, Italy

{bertozzi,broggi,fascal,tibaldi}@ce.unipr.it

Abstract— This work presents the experience of the ARGOProject. It started in 1996 at the University of Parma, basedon the previous experience within the European PROMETHEUSProject. In 1997 the ARGO prototype vehicle was set up with sen-sors and actuators, and the first version of the GOLD software sys-tem –able to locate one lane marking and generic obstacles on thevehicle’s path– was installed. In June 1998 the vehicle underwent amajor test (the MilleMiglia in Automatico, a 2000 km tour on Ital-ian highways) in order to test the complete equipment. The analy-sis of this test allowed to improve the system. Tha paper presentsthe current implementation of the GOLD system, featured by en-hanced Lane Detection abilities and extended Obstacle Detectionabilities, such as the detection of leading vehicles and pedestrians.Moreover it is described how this technology was transferred tothe automatic driving of snowcats in extreme environments.

I. I NTRODUCTION: THE ARGO PROJECT

The main target of the ARGO Project is the development ofan active safety system with the ability to act also as an auto-matic pilot for a standard road vehicle.

In order to achieve autonomous driving capabilities on theexisting road network with no need for specific infrastructures,a robust perception of the environment is essential. Althoughvery efficient in some fields of application, active sensors –besides polluting the environment– feature some specific prob-lems in automotive applications due to inter-vehicle interfer-ence amongst the same type of sensors, and due to the widevariation in reflection ratios caused by many different reasons,such as obstacles’ shape or material. Moreover, the maximumsignal level must comply with safety rules and must be lowerthan a safety threshold. For this reason in the implementationof the ARGO vehicle only the use of passive sensors, namelycameras, has been considered.

A second design choice was to keep the system costs low.These costs include both production costs (which must be mini-mized to allow a widespread use of these devices) and operativecosts, which must not exceed a certain threshold in order not tointerfere with the vehicle performance. Therefore low cost de-vices have been preferred, both for the image acquisition andthe processing: the prototype installed on ARGO is based oncheap camerasand acommercial PC.

The following section present the main functionalities inte-grated on the ARGO vehicle:

This research has been partially supported by the Italian National ResearchCouncil (CNR) under the frame of Progetto Finalizzato Trasporti 2 and ProgettoMadess 2, and by ENEA within the PRASSI and RAS Projects

• Lane Detection and Tracking• Obstacle Detection• Vehicle Detection and Tracking• Pedestrian Detection.

II. T HE GOLD SYSTEM

GOLD is the acronym used to refer to the software thatprovides ARGO with autonomous capabilities. It stands forGeneric Obstacles and Lane Detection since these were the twofunctionalities originally developed. Currently it integrates twoother functionalities: Vehicle Detection and Pedestrian Detec-tion; other functionalities are currently under development.

A. The Inverse Perspective Mapping

The Lane Detection and Obstacle detection functionalitiesshare the same underlying approach: the removal of the per-spective effect obtained through the Inverse Perspective Map-ping (IPM) [1], [2].

The IPM is a well-established technique that allows to re-move the perspective effect when the acquisition parameters(camera position, orientation, optics,...) are completely knownand when a knowledge about the road is given, such as aflatroad hypothesis. The procedure aimed at removing the perspec-tive effect resamples the incoming image, remapping each pixeltoward a different position and producing a new 2-dimensionalarray of pixels. The so-obtainedremapped imagerepresents atop view of the road region in front of the vehicle, as it wereobserved from a significant height. Figures1.a and1.b show animage acquired by ARGO’s vision system and the correspond-ing remapped image.

B. Lane Detection

Lane Detection functionality is divided in two parts: a lowerlevel part, which, starting from iconic representations of the in-coming images produces new transformed representations us-ing the same data structure (array of pixels), and a higher levelone, which analyzes the outcome of the preceding step and pro-duces a symbolic representation of the scene.

1) Low- and Medium-level Processing for Lane Detection:Lane Detection is performed assuming that a road marking inthe remapped image is represented by a quasi-vertical brightline of constant width on a darker background (the road). Thus,

Page 2: 127 Experiments in Robotics for Intelligent Road …127 Experiments in Robotics for Intelligent Road Vehicles Massimo Bertozzi, Alberto Broggi, Alessandra Fascioli, Amos Tibaldi Dipartimento

128

pixels belonging to a road marking feature a higher brightnessvalue than their left and right neighbors.

The first phase of road markings detection is therefore basedon a filter able to detect dark-bright-dark transitions.

(a) (b)

(c) (d) (e)

Fig. 1THE SEQUENCE OF IMAGES PRODUCED BY THE LOW-LEVEL LANE

DETECTION PHASE: (a) ORIGINAL ; (b) REMAPPED; (c) ENHANCED;

(d) BINARIZED ; (e) POLYLINES.

Due to different light conditions (e.g. in presence of shad-ows), pixels representing road markings may have differentbrightness, yet maintaining their superiority relationship withtheir horizontal neighbors. Therefore, since a simple thresholdseldom gives a satisfactory binarization, the image is enhancedexploiting its vertical correlation. Finally, the binarization isperformed by means of an adaptive threshold.[2]; the result ispresented in figure1.c. The binary image is scanned row by rowin order to build chains of 8-connected non-zero pixels (see fig-ure1.d).

Subsequently, each chain is approximated with apolylinecomposed by one or few segments, by means of an iterativeprocess. Initially, a single segment that joins the two extremaof the chain is considered. The horizontal distance between seg-ment’s mid point and the chain is used to determine the qual-ity of the approximation. In case it is larger than a threshold,two segments sharing an extremum are considered for the ap-proximation of the chain. Their common extremum is the in-tersection between the chain and the horizontal line that passesthrough the segment’s mid point. The process is iterated until asatisfactory approximation has been reached (see figure1.e).

2) High-level Processing for Lane Detection:in the high-level processing, the list of polylines is processed in order tosemantically group homologous features and to produce a highlevel description of the scene.

Each polyline is compared against the result of the previousframe, since continuity constraints provide a strong and robustselection procedure. The distance between the previous result

Fig. 2FILTERED POLYLINES, JOINED POLYLINES, AND MODEL FITTING FOR THE

LEFT (UPPER ROW) AND RIGHT (BOTTOM ROW) LANE MARKINGS .

and each extremum of the considered polyline is computed: ifall the polyline extrema lay within a stripe centered onto theprevious result then the polyline is marked as useful for the fol-lowing process. This process is repeated for both left and rightlane markings.

Once the polylines have been selected, all the possibilities arechecked for their joining. In order to be joined, two polylinesmust have similar direction; must not be too distant; their pro-jections on the vertical axis must not overlap; the higher poly-line in the image must have its starting point within an ellipticalportion of the image; in case the gap is large also the directionof the connecting segment is checked for uniform behavior.

All the new polylines, formed by concatenations of the origi-nal ones, are then evaluated. In case the polyline does not coverthe whole image, a penalty is given. Then, the polyline lengthis computed and a proportional penalty is given to short ones,as well as to polylines with extremely varying angular coeffi-cients. Finally, the polyline with the highest score is selected asthe best representative of the lane marking.

The polyline that has been selected at the previous step maynot be long enough to cover the whole image; therefore a furtherstep is necessary to extend the polyline. In order to take intoaccount road curves, a parabolic model has been selected to beused in the prolongation of the polyline in the area far from thevehicle. In the nearby area, a linear approximation suffices.

The two reconstructed polylines (one representing the leftand one the right lane markings) are now matched against amodel that encodes some more knowledge about the absoluteand relative positions of both lane markings on a standard road.

The model is kept for reference: the two resulting polylinesare fitted to this model and the final result is obtained as fol-lows. First the two polylines are checked for non-parallel be-havior; a small deviation is allowed since it may derive fromvehicle movements or deviations from the flat road assumption,that cause the calibration to be temporarily incorrect (diverg-ing of converging lane markings). Then the quality of the twopolylines, as computed in the previous steps, is matched: the fi-nal result will be attracted toward the polyline with the highest

Page 3: 127 Experiments in Robotics for Intelligent Road …127 Experiments in Robotics for Intelligent Road Vehicles Massimo Bertozzi, Alberto Broggi, Alessandra Fascioli, Amos Tibaldi Dipartimento

129

Fig. 3SOME RESULTS OFLANE DETECTION IN DIFFERENT CONDITIONS.

quality with a higher strength. In this way, polylines with equalor similar quality will equally contribute to the final result; onthe other hand, in case one polyline has been heavily recon-structed, or is far from the original model, or is even missing,the other polyline will be used to generate the final result.

Finally, figure 2 presents the resulting images referring tothe example presented in figure1. It shows the results of theselection, joining, and matching phases for the left (upper row)and for the right (bottom row) lane markings.

3) Results of Lane Detection:This subsection presents afew results of lane detection in different conditions (see fig-ure 3) ranging from ideal situations to road works, patches ofnon-painted roads, the entry and exit from a tunnel. Both high-way and extra-urban scenes are provided for comparison; thesystems proves to be robust with respect to different illumi-nation situations, missing road signs, and overtaking vehicleswhich occlude the visibility of the left lane marking. In casetwo lines are present –a dashed and a continuous one–, the sys-tem selects the continuous one.

C. Obstacle Detection

The Obstacle Detection functionality is aimed at thelocal-ization of generic objects that can obstruct the vehicle’s path,without their completeidentificationor recognition. For thispurpose a complete 3D reconstruction is not required and amatching with a given model is sufficient: the model representsthe environment without obstacles, and any deviation from themodel detects a potential obstacle. In this case the application

of IPM to stereo images [3], in conjunction with a-priori knowl-edge on the road shape, plays a strategic role.

1) Low-level Processing for Obstacle Detection:Assuminga flat road hypothesis, IPM is performed on both stereo im-ages. The flat road model is checked computing a pixel-wisedifference between the two remapped images. In correspon-dence to anything rising up from the road surface, the resultfeatures sufficiently large clusters of non-zero pixels. Due tothe stereo cameras’ different angles of view, an ideal homoge-neous square obstacle produces two clusters of pixels with atriangular shape in the difference image, in correspondence toits vertical edges [1].

Due to the texture, irregular shape, and non-homogeneousbrightness of real obstacles, the detection of the triangles be-comes difficult. Nevertheless, in the difference image someclusters of pixels with a quasi-triangular shape are anyway rec-ognizable, even if they are not clearly disjointed. The low-level portion of the process, detailed in figure4, is conse-quently reduced to the computation of difference between thetwo remapped images, a threshold, and a morphological open-ing aimed at removing small-sized details in the thresholdedimage.

2) Medium- and High-level Processing for Obstacle Detec-tion: The following process is based on the localization of pairsof triangles in the difference image by means of a quantitativemeasurement of their shape and position [4].

A polar histogramis used for the detection of triangles: it iscomputed scanning the difference image with respect to a pointcalled focusand counting the number of overthreshold pixels

Page 4: 127 Experiments in Robotics for Intelligent Road …127 Experiments in Robotics for Intelligent Road Vehicles Massimo Bertozzi, Alberto Broggi, Alessandra Fascioli, Amos Tibaldi Dipartimento

130

(f)(e) (g) (h)

(c) (d)

(a) (b)

Fig. 4OBSTACLE DETECTION: (a) LEFT AND (b) RIGHT STEREO IMAGES, (c) AND (d) THE REMAPPED IMAGES, (e) THE DIFFERENCE IMAGE, (f ) THE ANGLES OF

VIEW OVERLAPPED WITH THE DIFFERENCE IMAGE, (g) THE POLAR HISTOGRAM, AND (h) THE RESULT OFOBSTACLE DETECTION USING A BLACK

MARKING SUPERIMPOSED ON THE ACQUIRED LEFT IMAGE; THE THIN BLACK LINE HIGHLIGHTS THE ROAD REGION VISIBLE FROM BOTH CAMERAS.

Fig. 5CORRESPONDENCE BETWEEN TRIANGLES AND DIRECTIONS POINTED OUT

BY PEAKS DETECTED IN THE POLAR HISTOGRAM.

for every straight line originating from the focus. A low-passfilter is applied in order to decrease the influence of noise (seefigure4.f and4.g). When the focus is placed in the middle pointbetween the projection of the two cameras onto the road plane,the polar histogram presents an appreciable peak correspondingto each triangle [1]. Since the presence of an obstacle producestwo disjointed triangles (corresponding to its edges) in the dif-ference image, Obstacle Detection is limited to the search forpairs of adjacent peaks. The position of a peak in fact deter-mines the angle of view under which the obstacle edge is seen(figure5).

Peaks may have different characteristics, such as amplitude,

sharpness, or width. This depends on the obstacle distance, an-gle of view, and difference of brightness and texture betweenthe background and the obstacle itself. Two or more peaks canbe joined according to different criteria, such as similar ampli-tude, closeness, or sharpness. The analysis of a large number ofdifferent situations made possible the determination of a weightfunction embedding all of the above quantities.

The difference image is also used to estimate the obstacle dis-tance. For each peak of the polar histogram aradial histogramis computed scanning a specific sector of the difference image.The radial histogram is analyzed to detect the corners of trian-gles, which represent the contact points between obstacles androad plane, therefore allowing the determination of the obstacledistance through a simple threshold.

3) Results of Obstacle Detection:Figure6 shows the resultsobtained in a number of different situations. The result is dis-played with black markings superimposed on a brighter versionof the left image; they encode both the obstacles’ distance andwidth.

D. Vehicle Detection

The Platooning task is based on the detection of the distance,speed, and heading of the preceding vehicle. Since ObstacleDetection does not generate sufficiently reliable results –in par-ticular regarding obstacle distance–, a new functionality, Vehi-

Page 5: 127 Experiments in Robotics for Intelligent Road …127 Experiments in Robotics for Intelligent Road Vehicles Massimo Bertozzi, Alberto Broggi, Alessandra Fascioli, Amos Tibaldi Dipartimento

131

Fig. 6OBSTACLE DETECTION: THE RESULT IS SHOWN WITH A BLACK MARKING SUPERIMPOSED ONTO A BRIGHTER VERSION OF THE IMAGE CAPTURED BY

THE LEFT CAMERA; A BLACK THIN LINE LIMITS THE PORTION OF THE ROAD SEEN BY BOTH CAMERAS.

cle Detection, has been considered; the vehicle is localized andtracked using a single monocular image sequence.

The Vehicle Detection algorithm is based on the followingconsiderations: a vehicle is generally symmetric, characterizedby a rectangular bounding box which satisfies specific aspectratio constraints, and placed in a specific region of the image.These features are used to identify vehicles in the image in thefollowing way: first an area of interest is identified on the ba-sis of road position and perspective constraints. This area issearched for possible vertical symmetries; not only gray levelsymmetries are considered, but vertical and horizontal edgessymmetries as well, in order to increase the detection robust-ness. Once the symmetry position and width has been detected,a new search begins, which is aimed at the detection of the twobottom corners of a rectangular bounding box. Finally, the tophorizontal limit of the vehicle is searched for, and the precedingvehicle localized.

The tracking phase is performed through the maximizationof the correlation between the portion of the image containedinto the bounding box of the previous frame (partially stretchedand reduced to take into account small size variations due tothe increment and reduction of the relative distance) and thenew frame.

1) Symmetry detection:In order to search for symmetri-cal features, the analysis of gray level images is not suffi-cient. Strong reflections cause irregularities in vehicle sym-metry, while uniform areas and background patterns presenthighly correlated symmetries. In order to get rid of these prob-

lems, also symmetries in other domains are computed. In fact,to get rid of reflections and uniform areas, edges are extractedand thresholded, and symmetries are computed into this domainas well. Similarly, the analysis of symmetries of horizontaland vertical edges produces other symmetry maps, which –withspecific coefficients detected experimentally– can be combinedwith the previous ones to form a single symmetry map. Figure7shows all symmetry maps and the final one, that allows to detectthe vehicle. For each image, the search area is shown in darkgray and the resulting vertical axis is superimposed. For eachimage its symmetry map is also depicted. Bright points in themap encode the presence of high symmetries. The 2D symme-try maps are computed by varying the axis’ horizontal positionwithin the grey area (shown in the original image) and the sym-metry horizontal size. The lower triangular shape is due to thelimitation in scanning large horizontal windows for peripheralvertical axes.

2) Bounding box detection:After the localization of thesymmetry, the width of the symmetrical region is checked forthe presence of two corners representing the bottom of thebounding box around the vehicle. Perspective constraints aswell as size constraints are used to reduce the search. Figure8presents the results of the lower corners detection. This processis followed by the detection of the top part of the bounding box,which is looked for in a specific region whose location is againdetermined by perspective and size constraints.

3) Backtracking: Sometimes it may happen that in cor-respondence to the symmetry maximum no correct bounding

Page 6: 127 Experiments in Robotics for Intelligent Road …127 Experiments in Robotics for Intelligent Road Vehicles Massimo Bertozzi, Alberto Broggi, Alessandra Fascioli, Amos Tibaldi Dipartimento

132

Fig. 7COMPUTING THE RESULTING SYMMETRY: (a) GREY-LEVEL SYMMETRY;

(b) EDGE SYMMETRY; (c) HORIZONTAL EDGES SYMMETRY; (d) VERTICAL

EDGES SYMMETRY; (e) TOTAL SYMMETRY. FOR EACH ROW THE

RESULTING SYMMETRY AXIS IS SUPERIMPOSED ONTO THE LEFTMOST

ORIGINAL IMAGE .

(a) (b) (c)

Fig. 8DETECTION OF THE LOWER PART OF THE BOUNDING BOX: (a) ORIGINAL

IMAGE WITH SUPERIMPOSED RESULTS; (b) EDGES; (c) LOCALIZATION OF

THE TWO LOWER CORNERS.

boxes exist. Therefore, a backtracking approach is used: thesymmetry map is again scanned for the next local maximumand a new search for a bounding box is performed.

4) Results of Vehicle Detection:Figure9 shows some re-sults of vehicle detection in different situations.

E. Pedestrian Detection

The latest functionality integrated in the ARGO prototypevehicle is aimed at detecting pedestrians in road environments.The system is able to localize pedestrians in various poses, po-sitions and clothing, and is not limited to moving people.

The processing is divided in two different stages. Initially,attentive vision techniques relying on the search for specificcharacteristics of pedestrians such as vertical symmetry andstrong presence of edges, allow to select interesting regionslikely to contain pedestrians. Then, such candidates areas arevalidated verifying the actual presence of pedestrians by meansof an shape detection technique based on the application of au-tonomous agents.

1) Attentive vision:The areas considered as candidate in thefirst step are rectangular bounding boxes which:

• have a size in pixels deriving from the knowledge of theintrinsic parameters of the vision system;

• enclose a portion of the image which exhibits a strong ver-tical symmetry and a high density of vertical edges.

The search for candidates would require an exhaustive searchin the whole image. However, the knowledge of the system’sextrinsic parameters, together with a flat scene assumption, isexploited to limit the analysis to a stripe of the image. Thedisplacement of this stripe depends on the pedestrian’s distance,while its height is related to the pedestrian’s height. Indeed, theanalysis cannot be limited to a fixed size and distance of thetarget and a given range for each parameter is in fact explored.

A pre-attentive filter is applied, aimed at the selection of theareas with a high density of edges. Then, for each vertical sym-metry axis lying in these areas the best candidate area is se-lected among the bounding boxes which share that symmetryaxis, while having different position (base) and size (height andwidth). Vertical symmetry has been chosen as a main distinc-tive feature for pedestrians. Alternatively, two different symme-try measures are performed: one on the gray-level values andone on the gradient values, considering only edges with a ver-tical direction The selection of the best bounding box is basedon maximizing a linear combination of the two symmetry mea-sures, masked by the density of edges in the box. Figure10shows the original input image, the result of a clustering opera-tion used to improve the detection of edges, a binary image con-taining the vertical edges, and a number of histograms repre-senting the maximum (i) symmetry of gray-levels, (ii) symme-try of vertical edges, and (iii) density of vertical edges amongthe bounding boxes examined for each axis. The histogram infigure 10.g represents the linear combination of all the above.It is evident that, using the density of vertical edges as a mask,interesting areas present high values for both the symmetry ofgray-levels and symmetry of vertical edges. The resulting his-togram is therefore thresholded and its overthreshold peaks areselected as representing candidate bounding boxes.

2) Shape detection using autonomous agents:The outcomeof the low-level processing is a list of candidate bounding boxeswhich is fed to the following stage, whose task is their valida-tion as pedestrians, based on higher-level characteristics. Dif-ferent edges are selected and connected, where possible, in or-der to form a contour. Essentially, the process consists in adapt-ing a deformable coarse model to the bounding box. Thanks

Page 7: 127 Experiments in Robotics for Intelligent Road …127 Experiments in Robotics for Intelligent Road Vehicles Massimo Bertozzi, Alberto Broggi, Alessandra Fascioli, Amos Tibaldi Dipartimento

133

Fig. 9RESULTS OFVEHICLE DETECTION IN DIFFERENT ROAD SCENES.

to its roughness the model is sufficiently general and can beadapted to a variety of postures. Anyway, it is limited to stand-ing pedestrians.

The model adjustment is done through an evolutionary ap-proach with a number of independent agents acting as edgetrackers. The agents explore a feature map displaying the edgescontained in a given bounding box and stochastically build hy-potheses of a feasible contour of a human. The idea is takenfrom the Ant Colony Optimization (ACO) metaheuristic origi-nally inspired by the communication behavior of real ants [5].

This model can be applied to the analysis of an image by cre-ating a colony of artificial ants that looks for an optimal com-bination of edge pixels that maximizes the coherency of theirposition according to a given model (see figure11). Each antin turn traces a solution in a solution space made up of all thepossible paths connecting two pixels in a matrix. The deci-sional basis for each step of an ant is provided by two factors:one is a local heuristic that quantifies the attractiveness of pixelfor its intrinsic characteristics; the second is the information onthat pixel made available by previous attempts of other ants,in the form of a quantity of pheromone. The world is visitedby a number of ants in parallel, and the process is repeatedfor several cycles. At the end of each cycle, new pheromoneis deposed on the trails pursued by the ants, and some of thataccumulated evaporates. In this way, solutions built several cy-cles before, progressively loose their importance. On the other

World−matrix

Region

2 Region 3R

egio

n 1 R

egion 4

x

y

Starting zone Arrival zone

Model

Fig. 11ARTIFICIAL ANTS MOVE THROUGH THE WORLD-MATRIX STARTING FROM

THE LEFT HALF OF THE LOWER BORDER, AND MOVING THROUGH

REGIONS1, 2, 3AND 4 UNTIL THEY REACH THE ARRIVAL LINE .

hand, pheromone on pixels that compose the path of frequentlyselected solutions grows. and eventually this information sur-passes that given by the heuristic. Finally, the output is the pathof the ant of the highest rank in the last cycle.

3) Results of Pedestrian Detection:This algorithm suits amedium distance search area. In fact, large bounding boxesmay contain a too detailed shape, showing many disturbingsmall details that would certainly make their detection ex-tremely difficult. On the other hand, very small bounding boxes

Page 8: 127 Experiments in Robotics for Intelligent Road …127 Experiments in Robotics for Intelligent Road Vehicles Massimo Bertozzi, Alberto Broggi, Alessandra Fascioli, Amos Tibaldi Dipartimento

134

(a) (b) (c)

32 96 160 224 288 3520

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

32 96 160 224 288 3520

20

40

60

80

32 96 160 224 288 3520

50

100

150

(d) (e) (f)

32 96 160 224 288 3520

2000

4000

6000

(g) (h)

Fig. 10INTERMEDIATE RESULTS LEADING TO THE LOCALIZATION OF BOUNDING BOXES: (a) ORIGINAL IMAGE ; (b) CLUSTERIZED IMAGE; (c) VERTICAL EDGES;

(d) HISTOGRAM REPRESENTING GREY LEVEL SYMMETRIES; (e) HISTOGRAM REPRESENTING VERTICAL EDGES SYMMETRIES; (f) HISTOGRAM

REPRESENTING VERTICAL EDGES DENSITY; (g) HISTOGRAM REPRESENTING THE OVERALL SYMMETRYS FOR THE BEST BOUNDING BOX FOR EACH

COLUMN; (h) THE RESULTING BOUNDING BOX.

enclosing far away pedestrians feature a very low informationcontent. In these situations it is easy to obtain false positives,since many road participants (other than pedestrians), other ob-jects, and even road infrastructures may present morphologicalcharacteristics similar to a human shape. With the current setupthe search area ranges from 10 to 30 m.

The candidate selection procedure based on vertical symme-try and edge density proved to be a robust technique for focus-ing the attention on interesting regions. As an example, fig-ure12 shows the result of the selection of candidate boundingboxes in three different situations. Some general considerationscan be drawn. In situations in which pedestrians are sufficientlycontrasted with respect to the background and completely vis-ible the localization of candidates proves to be robust. Thanks

to the use of vertical edges the width of the bounding boxesenclosing pedestrians is generally determined with a good pre-cision. On the other hand, a lower accuracy is obtained forthe localization of the top and bottom of the bounding box. Arefinement of the bounding box height is under development.Symmetrical objects other than pedestrians may happen to bedetected as well. In order to get rid of such false positives anumber of filters have been devised which rely on the analy-sis of the distribution of edges within the bounding box. Thesefilters, which are still under evaluation, show promising resultsregarding the elimination of both artifacts (such as poles, roadsigns, buildings, and other road infrastructures) and symmet-rical areas given by a uniform portion of the background be-tween two foreground objects with similar lateral borders (see

Page 9: 127 Experiments in Robotics for Intelligent Road …127 Experiments in Robotics for Intelligent Road Vehicles Massimo Bertozzi, Alberto Broggi, Alessandra Fascioli, Amos Tibaldi Dipartimento

135

(a) (b) (c)

Fig. 12RESULT OF LOW-LEVEL PROCESSING IN DIFFERENT SITUATIONS: (a) A CORRECT DETECTION OF TWO PEDESTRIANS(b) A COMPLEX SCENARIO IN

WHICH ONLY THE CENTRAL PEDESTRIAN IS DETECTED; THE LEFT ONE IS CONFUSED WITH THE BACKGROUND, THE RIGHT ONE IS ONLY PARTIALLY

VISIBLE , WHILE THE HIGH SYMMETRY OF A TREE HAS BEEN DETECTED AS WELL; (c) TWO CROSSING PEDESTRIANS HAVE BEEN LOCALIZED, BUT OTHER

SYMMETRICAL AREAS ARE HIGHLIGHTED AS WELL.

figure12.c).From the first preliminary results, the ant-based processing

appears to be a promising method for detecting the contour of ahuman shape. To extend the detection to a larger set of pedes-trian postures, other models are currently under development.

III. T HE ARGO PROTOTYPEVEHICLE

ARGO, shown in figure13, is an experimental autonomousvehicle equipped with vision systems and an automatic steeringcapability.

Fig. 13THE ARGO PROTOTYPE VEHICLE.

It is able to determine its position with respect to the lane, tocompute the road geometry, to detect generic obstacles on thepath, and to localize a leading vehicle and pedestrians. Theimages acquired by a stereo rig placed inside the cabin areanalyzed in real-time by a computing system located into theboot. The results of the processing are used to drive an actuatormounted onto the steering wheel and other assistance devices.

The system was initially conceived as a safety enhancementunit: in particular it is able to supervise the driver behavior andissue both optic and acoustic warnings or even take control ofthe vehicle when dangerous situations are detected. Further de-velopments have extended the system functionalities to fullyautomatic driving capabilities.

Thanks to a control panel the driver can select the level ofsystem intervention. The following three driving modes are in-tegrated.

• Manual Driving: the system simply monitors and logs thedriver’s activity.

• Supervised Driving: in case of danger, the system warnsthe driver with acoustic and optical signals.

• Automatic Driving: the system maintains the full controlof the vehicle’s trajectory, and the two following function-alities can be selected:Road Following:consisting of theautomatic movement of the vehicle inside the lane; orPla-tooning: namely the automatic following of the precedingvehicle.

IV. T HE MilleMiglia in Automatico TEST

In order to extensively test the vehicle under different traf-fic situations, road environments, and weather conditions, a2000 km journey was carried out in June 1998. Other proto-types were tested on public roads with long journeys (CMU’sNavlab No Hands Across America, and a tour from Munichto Odense organized by the Universitat der Bundeswehr, Ger-many) whose the main differences were that the former wasrelaying also on non-visual information (therefore handling oc-clusions in a different way) and that the latter was equippedwith complex computing engines.

The MilleMiglia in Automaticotest was carried out about 2years ago, and the system was much more primitive than it iscurrently. Only Lane Detection and Obstacle Detection weretested: Lane Detection was based on the localization of a singleline, while the detection of the preceding vehicle was performedby the Obstacle Detection module; no tracking was done andonly the Road Following functionality was available.

V. CONCLUSION AND TECHNOLOGY TRANSFER

The functionalities, the algorithms, and –more generally–the experience developed within the ARGO project were trans-ferred to different domains. One of them is the automatic driv-ing of a snowcat in extreme environments. In this project,

Page 10: 127 Experiments in Robotics for Intelligent Road …127 Experiments in Robotics for Intelligent Road Vehicles Massimo Bertozzi, Alberto Broggi, Alessandra Fascioli, Amos Tibaldi Dipartimento

136

(a) (b) (c)

(d) (e) (f)

Fig. 15RESULTS OF SNOWCAT TRACK DETECTION IN DIFFERENT CONDITIONS.

Fig. 14THE PROTOTYPE VEHICLE DURING A TEST IN THEITALIAN TEST SITE.

founded by ENEA, visual information acquired from the driv-ing cabin of a snowcat are used to localize the tracks of pre-ceding vehicles, with the aim of following them as precisely aspossible. The reason is that cracks in the ice can put in seriousdanger both the driver and the snowcat itself. Therefore it is im-perative that the vehicle follows the same precise path definedby preceding vehicles.

Due to the extreme conditions of the working environment –where temperatures can reach even -80 degrees Celsius, the ter-rain is completely covered by snow or ice, strong sun lightingand reflections may be present, and no specific ground refer-ences are available nor assumptions can be made on the terrainslope– this application is extremely challenging and presents

many additional problems with respect to the driving of un-manned vehicles on traditional (un)structured roads.

Figure15 shows some results of snowcat track detection indifferent conditions. The algorithm [6], not discussed in thispaper, is able to successfully detect the tracks even in noisy orcritical conditions such as shadows, sun reflections, unknownterrain slope, and when dark objects are present as well.

REFERENCES

[1] A. Broggi, M. Bertozzi, A. Fascioli, and G. Conte,Automatic Vehicle Guid-ance: the Experience of the ARGO Vehicle. Singapore: World Scientific,Apr. 1999. ISBN 9810237200.

[2] M. Bertozzi and A. Broggi, “GOLD: a Parallel Real-Time Stereo VisionSystem for Generic Obstacle and Lane Detection,”IEEE Trans. on ImageProcessing, vol. 7, pp. 62–81, Jan. 1998.

[3] M. Bertozzi, A. Broggi, and A. Fascioli, “Stereo Inverse Perspective Map-ping: Theory and Applications,”Image and Vision Computing Journal,vol. 8, no. 16, pp. 585–590, 1998.

[4] A. Fascioli, Vision-based Automatic Vehicle Guidance: Developmentand Test of a Prototype. PhD thesis, Dipartimento di Ingegneriadell’Informazione, Universita di Parma, Italy, Jan. 2000.

[5] M. Dorigo and G. Di Caro, “The ant colony optimization meta-heuristic,”in New Ideas in Optimization(D. Corne, M. Dorigo, and F. Glover, eds.),pp. 11–32, London, UK: McGraw-Hill, 1999.

[6] A. Broggi and A. Fascioli, “Artificial Vision in Extreme Environments forSnowcat Tracks Detection,”IEEE Trans. on Intelligent Transportation Sys-tems, vol. 3, pp. 162–172, Sept. 2002.