optical flow based vehicle tracking strengthened by statistical decisions

11
Optical flow based vehicle tracking strengthened by statistical decisions Fatemeh Karimi Nejadasl a, , Ben G.H. Gorte a , Serge P. Hoogendoorn b a Delft University of Technology, Delft Institute of Earth Observation and Space Systems, Kluyverweg 1, 2629 HS, The Netherlands b Delft University of Technology, Transport and Planning Department, Stevinweg 1, 2628CN, Delft, The Netherlands Received 13 March 2006; received in revised form 18 September 2006; accepted 20 September 2006 Available online 13 November 2006 Abstract Reliable tracking of cars from aerial video imagery is one of the main ingredients of microscopic traffic monitoring. Current tracking methods however are not able yet to track all the vehicles in all frames of video imagery taken by e.g. a helicopter. Several problem scenarios can be distinguished, like situations with many similar cars in congested traffic areas, cars that appear in low contrast compared to the background and cars that are occluded in some frames by other cars or by traffic signs. In this paper an improved method is described that continuously tracks all vehicles from their appearance into the viewing area until their exit. Our algorithm starts by separately tracking individual car pixels and the complete car region as a whole using the gradient-based optical flow method. A scale space approach is used to initiate the optical flow method. The best result as obtained from the intermediate results is used in the following statistical decision making step. Finally, either the best results are accepted and by applying a rigid body assumption, one displacement result is adapted for the car as a whole, or the best results are rejected, because even the best results fail a quality criterion. Continuation of these steps for all frames constitutes the final tracking result. This method solves most of the sketched problem scenarios as is illustrated by applying it on suited helicopter video imagery. © 2006 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved. Keywords: Traffic monitoring; Tracking; Optical flow; Gradient based; Scale space; Statistic 1. Introduction Traffic congestion is an important problem in modern society. A lot of money and time is wasted in traffic jams. Car crashes and accidents are more frequent during busy traffic conditions. Several efforts are made to tackle this problem: better facilities and regulations should improve the situation on existing roads while the number of the roads is extended as well. Traffic congestion is highly dependent on the behavior of individual drivers. For example, reaction times and lane-changing techniques vary from driver to driver. Therefore it is useful to model the behavior of individual drivers, as well as the interaction between drivers, before new decisions and regulations for traffic congestion control are initiated. Current traffic theories are not yet able to correctly model the behavior of ISPRS Journal of Photogrammetry & Remote Sensing 61 (2006) 159 169 www.elsevier.com/locate/isprsjprs Corresponding author. E-mail addresses: [email protected] (F. Karimi Nejadasl), [email protected] (B.G.H. Gorte), [email protected] (S.P. Hoogendoorn). 0924-2716/$ - see front matter © 2006 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved. doi:10.1016/j.isprsjprs.2006.09.007

Upload: fatemeh-karimi-nejadasl

Post on 26-Jun-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optical flow based vehicle tracking strengthened by statistical decisions

emote Sensing 61 (2006) 159–169www.elsevier.com/locate/isprsjprs

ISPRS Journal of Photogrammetry & R

Optical flow based vehicle tracking strengthenedby statistical decisions

Fatemeh Karimi Nejadasl a,⁎, Ben G.H. Gorte a, Serge P. Hoogendoorn b

a Delft University of Technology, Delft Institute of Earth Observation and Space Systems, Kluyverweg 1, 2629 HS, The Netherlandsb Delft University of Technology, Transport and Planning Department, Stevinweg 1, 2628CN, Delft, The Netherlands

Received 13 March 2006; received in revised form 18 September 2006; accepted 20 September 2006Available online 13 November 2006

Abstract

Reliable tracking of cars from aerial video imagery is one of the main ingredients of microscopic traffic monitoring. Currenttracking methods however are not able yet to track all the vehicles in all frames of video imagery taken by e.g. a helicopter. Severalproblem scenarios can be distinguished, like situations with many similar cars in congested traffic areas, cars that appear in lowcontrast compared to the background and cars that are occluded in some frames by other cars or by traffic signs. In this paper animproved method is described that continuously tracks all vehicles from their appearance into the viewing area until their exit. Ouralgorithm starts by separately tracking individual car pixels and the complete car region as a whole using the gradient-based opticalflow method. A scale space approach is used to initiate the optical flow method. The best result as obtained from the intermediateresults is used in the following statistical decision making step. Finally, either the best results are accepted and by applying a rigidbody assumption, one displacement result is adapted for the car as a whole, or the best results are rejected, because even the bestresults fail a quality criterion. Continuation of these steps for all frames constitutes the final tracking result. This method solvesmost of the sketched problem scenarios as is illustrated by applying it on suited helicopter video imagery.© 2006 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rightsreserved.

Keywords: Traffic monitoring; Tracking; Optical flow; Gradient based; Scale space; Statistic

1. Introduction

Traffic congestion is an important problem in modernsociety. A lot of money and time is wasted in trafficjams. Car crashes and accidents are more frequentduring busy traffic conditions. Several efforts are made

⁎ Corresponding author.E-mail addresses: [email protected]

(F. Karimi Nejadasl), [email protected] (B.G.H. Gorte),[email protected] (S.P. Hoogendoorn).

0924-2716/$ - see front matter © 2006 International Society for PhotogrammAll rights reserved.doi:10.1016/j.isprsjprs.2006.09.007

to tackle this problem: better facilities and regulationsshould improve the situation on existing roads while thenumber of the roads is extended as well.

Traffic congestion is highly dependent on thebehavior of individual drivers. For example, reactiontimes and lane-changing techniques vary from driver todriver. Therefore it is useful to model the behavior ofindividual drivers, as well as the interaction betweendrivers, before new decisions and regulations for trafficcongestion control are initiated. Current traffic theoriesare not yet able to correctly model the behavior of

etry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V.

Page 2: Optical flow based vehicle tracking strengthened by statistical decisions

160 F. Karimi Nejadasl et al. / ISPRS Journal of Photogrammetry & Remote Sensing 61 (2006) 159–169

drivers during congested or nearly congested trafficflow, taking individual driver's behavior into account.For this so-called microscopic traffic models are needed.Large amounts of data are required to set up thosemodels and determine their parameters.

Traffic parameter extraction from airborne imagesequences is very useful for setting up and calibratingtraffic flow models (Ossen and Hoogendoorn, 2005).To this end, information about each vehicle isrequired during the period of time when the vehicleis present in the scene. A possible method is to detecta vehicle in a video frame when it enters the scene andthen track it in successive frames until it exits thecaptured scene.

Our dataset is recorded by a camera mounted on ahelicopter. Since we want to model the behavior of asmany vehicles (drivers) as possible, we attempt to covera large highway section, leading to the lowest spatialresolution that accuracy requirements allow. Typicallywe use a spatial resolution (ground pixel size) between25 and 50 cm.

Helicopter movement invokes camera motion inaddition to object (i.e. vehicle) motion. We haveremoved camera motion with the approach of Hoogen-doorn et al. (2003), while unwanted areas outside theroad boundary are eliminated by the method of Gorteet al. (2005). Vehicles were automatically detected usingthe scheme of Hoogendoorn et al. (2003), yet some lowcontrast cars are detected manually.

Here we focus on difficult situations where thetracking of our earlier method failed (Hoogendoornet al., 2003). The main problems occur in situationswith many similar cars, especially in congested trafficareas when using a large viewing area. Also cars thatappear in low contrast compared to the background andocclusions of part of the car by e.g. traffic signs causeproblems for the existing tracking methods.

To improve the tracking performance, we investi-gate the use of optical flow method in this paper. Theoptical flow method is sensitive to small movementseven in the case of low contrast because it considersspatial and temporal change simultaneously. This willbe elaborated in Section 3.1.

A further improvement of the tracking results isexpected by adding an automatic decision step, i.e. byanalyzing the intermediate results of each car. The finaldecision shows whether to select the best result from anumber of alternatives, or to quit the tracking because ofthe poorness of the data in the specific frame. Then theresult for a specific car in that frame is discarded and anew detection and tracking sequence is started in thenext frame. This approach finally results in a complete

list of positions in each frame for each car from itsappearance to its exit.

The paper is organized as follows. In Section 2 wepresent related work. Section 3 discusses the pixel andregion based tracking method using gradient basedoptical flow. A scale space approach initiates bothmethods. At the end of this section, potential errorsources are mentioned. The pixel and region basedtracking methods make up the intermediate results foreach car. We make a decision based on the most likelyscenarios found among the improved intermediateresults. This process is outlined in Section 4 as a robusttracking method. We give results in Section 5 andconclusions in Section 6.

2. Related work

Automatic object tracking receives attention incomputer vision for a very diverse range of applications.Different methods are used to track objects.

Cootes et al. (2002), and Cootes et al. (2004) presentan active shape model and an active appearance modelin the case of the large variations in shape and brightnessbetween different frames. An active contour model isused by Berger et al. (1999) and Remagnino et al.(1997). This method tracks an amorphous object startingfrom one contour and allows the contour to grow untilthe object is completely captured. Pennec et al. (2003)and Montagnat et al. (2003) also use such deformablemodels to track objects with large shape and brightnessvariations.

Above-mentioned methods are mostly used in non-rigid object tracking. However, radial map representa-tion is a similar idea as the active contour model and isused by Smith and Brady (1995) with a video cameramounted on a car for the tracking of front cars. Modelbased tracking has been used for car tracking as well,by Haag and Nagel (2000) and Remagnino et al.(1997) with a video camera installed on a fixed placewith a very low height. But these methods are used inthe cases where cars appear with a lot of details andwith clear boundaries. They are not applicable to ourdataset where cars are imaged with much lowerresolution and have possibly low contrast relative tothe background.

Hoogendoorn et al. (2003) detect vehicles in airborneimage sequences by a difference method. Vehicledetection is done for all frames. Afterwards a crosscorrelation method is used to link the same car in allframes. This works well for cars with a high contrast.However, this method is not able to track cars with a lowcontrast (dark cars on a dark road surface). It also fails in

Page 3: Optical flow based vehicle tracking strengthened by statistical decisions

161F. Karimi Nejadasl et al. / ISPRS Journal of Photogrammetry & Remote Sensing 61 (2006) 159–169

very congested traffic areas with a lot of similar cars. Inthe work of Kirchhof et al. (2005), Reinartz et al. (2005)and Ernst et al. (2005), the tracking is done in a similarway, although they use different methods for calcula-tion of camera motion and vehicle detection. Thesemethods face the same problems with low contrast carsand with areas with many similar cars.

There are two different methods for tracking: pixelbased and region based. The pixel based method hassome advantages in case of occlusions and changingintensities. Even if a part of an object is occluded byanother object, remaining visible pixels may still bedetected. However, which pixel of a car should beselected in which frame cannot be determined. Smithand Brady (1995), Smith (1998) and Coifman et al.(1998) used pixel based methods in vehicle tracking.Region based methods preserve the shape of objectsthus the chance of wrong tracking is decreasing.However, there is a strong possibility to loose thetracking in occluded areas or in the case of existingsimilar objects. Therefore both methods when appliedindependent from each other are not able to handle thelong-term tracking in our dataset.

The approach presented in this paper modifieshighway vehicle tracking by combining the pixel andregion based optical flow methods. A scale spaceapproach initiates the optical flow method. The finalresult is obtained based on a statistical decision step thatincorporates the rigid body characteristic of a car.

3. Tracking

The objective of tracking is to obtain the positions of aspecific object in the successive frames. The trackingboils down to the calculation of object displacementbetween two consecutive frames. Continuation of findingcorrespondence for the next two consecutive frames,while using the previous result as a new object, makes upthe tracking process.

In this paper, the objects are cars. A moving car isrepresented by a group of adjacent pixels, which wewant to track through the image sequence. This paperaddresses two possibilities: first group pixels intoregions followed by the tracking of those regions, or,on the other hand, first track individual pixels.

3.1. Gradient based optical flow

A method proposed by Lucas and Kanade (1981) isused to calculate the displacement of each pixel in twoconsecutive frames. It is assumed that the brightness ofthe pixel belonging to a moving object remains fixed in

the consecutive frames. This assumption is mathemati-cally translated into the form:

Iðx1; y1; t1Þ ¼ Iðx2; y2; t2Þ ð1Þ

In the above equation, I, xi, yi, and ti for i=1, 2,denote brightness, spatial coordinate and time in the firstand second image frame respectively. The relation of x2,y2 and t2 to the correspondent parameters in the previousframe is defined as follows:

x2 ¼ x1 þ yx; y2 ¼ y1 þ yy; t2 ¼ t1 þ yt ð2Þ

where δt is the time difference between two consecutiveframes. If we assume that the value of δt is one, than δx andδy are the displacements of x1 and y1 in x and y directionrespectively. I(x2, y2, t2) is expanded by a Taylor series:

Iðx1 þ yx; y1 þ yy; t1 þ ytÞ¼ AI

Axyxþ AI

Ayyyþ AI

ATþH ð3Þ

where Θ represents the omitted higher order terms. Fromnow on we use Ix ¼ AI

Ax ; Iy ¼ AIAy ; and It ¼ AI

At as asimplified form, so Ix, Iy, and It denote spatial and temporalgradients respectively. Commonly, higher orders areneglected to keep the equation system tractable. Thereforewe can rewrite the equation into the form of gradientelements as is shown below:

Ixyxþ Iyyy ¼ −It ð4ÞEquation Eq. (4) is the so-called optical flow

constraint equation (OFCE) or brightness equation.According to the above procedure, Eq. (1) is

approximated by the linearized partial differential asgiven in Eq. (4). Therefore an iterative process is requiredto estimate the parameters which are the displacementvectors, δx and δy. A good initial value approximation isneeded to assure convergence of the iterative process.

3.2. Pixel tracking

In the conventional method, the optical flow methodis used to find the dense flow of all the image pixels. Inthis section, displacement is calculated only for onespecific pixel.

According to the OFCE equation, there is oneequation with two parameters for each pixel. Therefore,some constraints are required to solve the equation andget values for the parameters.

The assumption is made that neighboring pixels aremoving together. As a result, each of these pixels

Page 4: Optical flow based vehicle tracking strengthened by statistical decisions

Fig. 1. Ideal case: (simulated) superimposed vectors represent thedisplacement of each pixel (δx, δy) on a cut of a car as present in a(non-simulated) image frame.

162 F. Karimi Nejadasl et al. / ISPRS Journal of Photogrammetry & Remote Sensing 61 (2006) 159–169

generates one equation with the same parameters.Grouping these equations results in an equation systemwritten in the Y=AX form. Solving this system results inthe required parameters values. Here

A ¼Ixð1Þ Iyð1ÞIxð2Þ Iyð2Þv v

IxðNÞ IyðNÞ

2664

3775; Y ¼

Itð1ÞItð2Þv

ItðNÞ

2664

3775;

X ¼ yxyy

� �;

ð5Þ

represent respectively the coefficient, time-gradient, andparameter matrixes. T denotes the transpose of a matrix.Nis the number of neighboring pixels which is used tocalculate the parameters for a specific pixel. The numberin each gradient indicates the index of each neighboringpixel.

Fig. 2. Problems: similar brightness (top left); similar object, the dark car is reright); ambiguity in edges (bottom).

In the OFCE, the space and time gradients (Ix, Iy, andIt) play the important role of constructing the A and Ymatrixes. The gradient in time and space is calculatedfrom two consecutive images in order to include timeand space in Eq. (4). The three convolution matrixes,Cx, Cy, and Ctare used to obtain gradients in x-direction,y-direction and time:

Cx ¼ −1 1−1 1

� �; Cy ¼ −1 −1

1 1

� �;

Ct ¼ 1 11 1

� �ð6Þ

The parameters are obtained from the results of X=(ATA)−1 (ATY).

In our helicopter images cars are only represented bya few pixels. Therefore, neighbouring pixels in a small(3 by 3) window area are selected, which yields 9equations to determine two displacement parameters.Therefore for each pixel, the displacements arecalculated. In the ideal case all the displacements shouldbe the same because the car is a rigid object (Fig. 1).

3.3. Region tracking

In the previous section, the displacement of a singlepixel is calculated using its neighboring pixels asconstraint in the OFCE equation. The displacementcalculation of a region using the OFCE equation isintroduced in this section.

presented by a red rectangle and the roads stripe by an yellow one (top

Page 5: Optical flow based vehicle tracking strengthened by statistical decisions

Fig. 3. Brightness variations: the position of the specific pixel is shown by yellow dot (top left); the brightness variation of this pixel in the differentframework is represented as a graph with x-direction is position of this pixel in different frames and y-direction is its brightness (top right); pixelposition, similar for the bright car (bottom left); brightness variation, similar for bright car (bottom right).

Fig. 4. Erroneous tracking: five top figures are depicted the pixel tracking which is lost from third frame (top); tracking path of the same pixel in 2D(bottom left) and 3D (bottom right).

163F. Karimi Nejadasl et al. / ISPRS Journal of Photogrammetry & Remote Sensing 61 (2006) 159–169

Page 6: Optical flow based vehicle tracking strengthened by statistical decisions

Fig. 5. Intermediate improved results of a bright car (left) and its 2D histogram (right); x-direction, y-direction, and z-direction are respectively δx, δy,and the histogram of the joint δx and δy.

Fig. 6. Decision making: intermediate improved results (top left); final result after decision making. The wrong displacements are corrected afterstatistical decision; bright cars in two successive frames (bottom). The second image (bottom right) is the result of our method.

Fig. 7. Intermediate improved results of a dark car (left) and its 2D histogram (right) where x-direction, y-direction, and z-direction are respectivelyδx, δy, and the histogram of the joint δx and δy.

164 F. Karimi Nejadasl et al. / ISPRS Journal of Photogrammetry & Remote Sensing 61 (2006) 159–169

Page 7: Optical flow based vehicle tracking strengthened by statistical decisions

Fig. 8. Decision making: intermediate improved results (top left); final result after decision making. The wrong displacements are corrected afterstatistical decision; dark cars in two successive frames (bottom). The second image (bottom right) is the result of our method.

Fig. 9. Intermediate improved results of a truck (left) and its 2D histogram (right) where x-direction, y-direction, and z-direction are respectively δx,δy, and the histogram of the joint δx and δy.

Fig. 10. Decision making: intermediate improved results (top left); final result after decision making. The wrong displacements are corrected afterstatistical decision; trucks in two successive frames (bottom). The second image (bottom right) is the result of our method.

165F. Karimi Nejadasl et al. / ISPRS Journal of Photogrammetry & Remote Sensing 61 (2006) 159–169

Page 8: Optical flow based vehicle tracking strengthened by statistical decisions

Fig. 11. Bright car tracking.

166 F. Karimi Nejadasl et al. / ISPRS Journal of Photogrammetry & Remote Sensing 61 (2006) 159–169

For the region based solution, pixels that laycompletely in a region, which is a car in our situation,are used as a constraint. The difference of this solutionin comparison to the pixel based solution is that a groupof car pixels contribute together to the solution of thedisplacement of the region.

In the same procedure as described in Section 3.2,each pixel of a car constructs one Eq. (4) with the sameparameters. Then the parameters are obtained using theX=(ATA)−1(ATY ) form.

Fig. 12. Dark ca

3.4. Scale space

Initial values play an important role in the OFCEequation. Without good initial values the feature orobject is placed outside the searching area especiallywhen using a small window size or when dealing withlarge displacement.

Initial values can be obtained from a cross-correlationmethod whereby, for high displacements, a very bigwindow size is required. The problem with a big window

r tracking.

Page 9: Optical flow based vehicle tracking strengthened by statistical decisions

167F. Karimi Nejadasl et al. / ISPRS Journal of Photogrammetry & Remote Sensing 61 (2006) 159–169

size is that it introduces ambiguity, i.e., the change offinding similar objects is high. On the other hand, with asmall window size we cannot detect objects with a highdisplacement. In our dataset, both low and highdisplacements exist.

As it is well known, good initial can be obtained whenapplying a scale space approach, if some requirementsconcering the Fourier-spectrum of the involved data arefulfilled. In our case, we scale down the images until azero displacement can be used as an initial value. At eachlevel optical flow is calculated while incorporating thescale result of the previous (coarser) level, until theoriginal scale is reached. Moreover, at each scale aniterative process is required in order to numerically solvethe OPCE differential equation given in Eq. (4). The(approximate) solution gives the final displacement.

3.5. Source of errors

In the ideal case, both methods described above givethe same result. The displacement of each pixel representsthe displacement of the object where it belongs to (i.e. thecar). But in reality this is not true.

Several reasons for this are:

(1) Similar objects: because of the large viewing area ofour dataset, a lot of cars are present, whichmay lookvery similar in the image. This is especially the casewhen the traffic is congested, like in our dataset.Moreover, a high similarity appears to exist betweendark cars and small road stripes (Fig. 2 top-rightwhere the dark car is represented by a red rectangleand the roads stripe by an yellow one). Therefore inthis case, the region tracking (Section 3.3) is goingwrong.

(2) Similar brightness: this happens for instanceinside trucks (Fig. 2 top left): when the determi-nant of ATA is zero according to the equationY=AX, the result is infinitive. Therefore we testthe determinant of ATA before the displacementcalculation. The values near to zero are discarded.

(3) Cluttered background: in our case the appearingof road stripes. This case causes a problem in thepixel based tracking method.

(4) Brightness variation: As it is demonstrated in Fig. 3,the local brightness variation for a specific pixel canbe very large.Often a specular effect is the reason forthis so that the result of the pixel tracking will gowrong for these specific pixels. An ambiguity inedges (Fig. 2 bottom) is another possible situationwhere the pixel based tracking could go wrong. Theregion based solution is not affected by this problem.

(5) Occlusion: in this situation a tracking result in theregion based method is wrong. However, in thepixel based tracking method, a tracking resultdepends on the selected pixels. If one pixel isoccluded, there are always pixels available that canbe used to continue tracking.

Global brightness variations from one frame to thenext frame does not affect the result because of the use ofthe gradient method. Experiments confirmed this ondifferent dataset using intensional brightness differences.

Because of the above-mentioned problems, some ofthe results are not reliable. Therefore different pixels inthe car obtain different displacement values. In the sameway the region based tracking gives the erroneous result.An incorrect displacement spreads over the sequence andproduces erroneous tracking results (Fig. 4). Thereforevalidation is becoming an important task and a decisionon the correctness of intermediate results is required.

According to the above-mentioned problem, the long-term continuation of the vehicle tracking by only usingone of its pixels is not possible even within the regiontracking method. A solution is to model the effects of thedifferent problems situations which is a complicated task.Therefore we implemented a new decision-based trackingmethod which uses all the results that were produced foreach car. This method will be discussed in the next section.

4. Robust tracking

In this section, the different results of pixel andregion based methods are combined. The first error,which is described in Section 3.5, is mainly removedbecause of using the scale space. As is explained inSection 3.5, the second error, which is the effect ofsimilar brightness, is discarded before calculation. Theremaining errors are removed based on the decision as isdescribed in Section 4.1.

4.1. Decision making

So far different displacement vectors are calculatedfor each pixel inside the car and for the car as a whole. Inthis section a method for finding the most likely cardisplacement is described, based on analyzing thedistribution of the displacements of the individual carpixels.

The displacements of the individual car pixels can becollected in a histogram. Typically, many differentdisplacement vectors will be found, especially in thecase of the more difficult scenarios as explained inSection 3.5. However, the erroneous displacements are

Page 10: Optical flow based vehicle tracking strengthened by statistical decisions

168 F. Karimi Nejadasl et al. / ISPRS Journal of Photogrammetry & Remote Sensing 61 (2006) 159–169

expected to occur in all kind of different directions,while almost correct displacement vectors will occuraround the real, but yet unknown displacement vector.Therefore, we round all individual displacement vectorsto pixel precision, and count the number of occurrencesof each (rounded) displacement vector. The vector withthe largest number of occurrences is assumed torepresent the real car displacement (Figs. 5, 7, and 9).

Moreover, comparing this number to the totalnumber of vectors gives us a measure for the reliabilityof the result. If the maximum number of occurrences isbelow a certain threshold value or the joint histogramvariations are small, the reliability is considered too low,and the process for finding the car displacement startingfrom the current frame is stopped. The displacementfinding process for this car however is started again inthe next frame.

5. Data description and tracking results

The image sequence acquisition took place atselected highway sections of A12 highway near thecity of The Hague on 25th April 2002 during elongatedperiods of time. A Bell 206 JetRanger helicopter wasused as a platform because of its capability to hoverstationary above a section. A typical sequence com-prises a section of 250–500 m being recorded during30–60 min. The camera (Basler A101f, focal length16 mm) records 10 grayscale frames per second with animage size of 1300×1030 pixels, the larger axis beingaligned with the highway. The ground pixel size is 25–50 cm. One of the problems concerns the instability ofthe camera platform caused by wind and turbulenceinduced by the helicopter itself while hovering at aconstant position. Thus there are both camera and objectmotion in the video images. Moreover, sometimes thinor even thicker clouds were present which led to imagesof varying radiometric characteristics.

We selected sub-images of approximately 1000×200pixels covering a particular highway section during thevehicle sequence. The results of the tracking betweensuccessive frames are depicted in Figs. 5 and 6 for a brightcar,Figs.7and8foradarkcar,andFigs.9and10fora truck.

The top left figure of these groups demonstrates theresults for one car after using an initial value. The lastimage of the top row (top right) shows the final resultafter making the statistical decision. For all three typesof vehicles, the results are correct. The final result isdisplayed in the form of a rectangle which is super-imposed to the second frame (bottom right figure). Themain result is depicted in the bottom-left figuer in thesame form as the right one.

The tracking result of a specific car from enteringthe street until exit is presented in Figs. 11 and 12.When a new car enters the scene, the detection methodautomatically detects it and then the described trackingmethod is able to track this car until leaving the scene.

We have displayed the tracking results in both a 2Dand 3D view (Figs. 11 and 12). In the 2D view, the firstimage frame is superimposed with the position of the carin different frames. The 3D view shows the state in xand y coordinates and time in z coordinates.

Different colors in both 2D and 3D tracking ofFigs. 11 and 12 represent the stopping of the trackingand the continuation in later frames. In both figures, carswere occluded by traffic signs.

With our method, all the bright cars are successfullytracked from entering a scene until exiting it. We havegot a significant improvement in the tracking of darkcars, which are problematic in our dataset. 92% of thedark cars within a dark background are tracked formbeginning until exiting a scene.

6. Conclusion

In this paper, we presented an improved method forlong-term vehicle tracking from aerial video images. Wehave developed a tracking method based on the opticalflow method incorporating a scale space approach. Thescale space approach delivers initial values for findingcorrespondences in the next frame. Moreover we em-ployed a decision step. In this step we decide whether tocontinue the tracking or stop it because of low quality. Insuch case the tracking is continued by detecting themissed car in the next frame.

The statistical decision step is performed based on therigid-object assumption. For this decision, the most likelydisplacementvectoramongofdisplacementvectorsofeachindividual car pixel is assumed to present the real cardisplacement. In the case of having maximum occurrenceof these displacement vectors less than predefinedthreshold or low variations in joint histogram, the trackingis stopped.

The experimental results are promising even in verydifficult situationssuchasadarkcarsondarkbackgrounds,small vehicle sizes, and large numbers of similar vehicles.

Acknowledgements

The research presented in this paper is part of theresearch program ”Tracing Congestion Dynamics —with Innovative Traffic Data to a better Theory”, spon-sored by the Dutch Foundation of Scientific ResearchMaGW-NWO. We also like to thank the paper reviewers.

Page 11: Optical flow based vehicle tracking strengthened by statistical decisions

169F. Karimi Nejadasl et al. / ISPRS Journal of Photogrammetry & Remote Sensing 61 (2006) 159–169

Their comments led to substantial improvement of thisarticle.

References

Berger, M.-O., Winterfeldt, G., Lethor, J.-P., 1999. Contour tracking inechocardiographic sequences without learning stage: application tothe 3D reconstruction ofthe 3D beating left ventricle. Proceedingsof Medical Image Computing and Computer-Assisted Interven-tion. Lecture Notes on Computer Science, vol. 1679. Springer,Cambridge, UK, pp. 508–515.

Coifman, B., Beymer, D., McLauchlan, P., Malik, J., 1998. A real-timecomputer vision system for vehicle tracking and traffic surveillance.Proceedings of Transportation Research, Part C 6 (4), 271–288.

Cootes, T.F., Wheeler, G., Walker, K., Taylor, C.J., 2002. View-basedactive appearancemodels. Image andVision Computing 20 (9–10),657–664.

Cootes, T.F., Twining, C., Taylor, C.J., 2004. Diffeomorphic statisticalshape models. Proceedings of British Machine Vision Conference1, 447–456.

Ernst, I., Hetscher, M., Thiessenhusen, K., Ruhe̷, M., Börner, A., Zuev,S., 2005. New approaches for real time traffic data acquisition withairborne systems. International Archives of the Photogrammetry,Remote Sensing and Spatial Information Sciences 36 (Part 3/W24),69–73.

Gorte, B.G.H., Karimi Nejadasl, F., Hoogendoorn, S., 2005. Outlineextraction of motorway from helicopter image sequence. Interna-tional Archives of the Photogrammetry, Remote Sensing andSpatial Information Sciences 36 (Part 3/W24), 179–184.

Haag,M., Nagel, H.H., 2000. Incremental recognition oftraffic situationsfrom video image sequences. Image and Vision Computing 18 (2),137–153.

Hoogendoorn, S.P., van Zuylen, H.J., Schreuder, M., Gorte, B.G.H.,Vosselman, G., 2003. Microscopic traffic data collection by remotesensing. Transportation Data Research Record No. 1855, 121–128.

Kirchhof, M., Michaelsen, E., Jäger, K., 2005. Motion detection byclassification of local structures in airborne thermal videos.International Archives of the Photogrammetry, Remote Sensingand Spatial Information Sciences 36 (Part 3/W24), 63–68.

Lucas, B.D., Kanade, T., 1981. An iterative image registration techniquewith an application to stereo vision. Proceedings of the 7thInternational Joint Conference on Artificial Intelligence, Vancouver,Canada, pp. 674–679.

Montagnat, J., Sermesant, M., Delingette, H., Malandain, G., Ayache,N., 2003. Anisotropic filtering for model-based segmentation of4D cylindrical echocardiographic images. Pattern RecognitionLetters 24 (4–5), 815–828.

Ossen, S., Hoogendoorn, S., 2005. Car-following behavior analysisfrom microscopic trajectory data. Transportation Research BoardAnnual Meeting 2005, Washington D.C., on CD-ROM.

Pennec, X., Cachier, P., Ayache, N., 2003. Tracking brain deforma-tions in time-sequences of 3D US images. Pattern RecognitionLetters 24 (4–5), 801–813.

Reinartz, P., Krauss, T., Pötzsch, M., Runge, H., Zuev, S., 2005. Trafficmonitoring with serial images from airborne cameras. Proceedingsof Workshop on High-Resolution Earth Imaging for Geoinforma-tion '05, Hannover, Germany, on CD-ROM.

Remagnino, P., Baumberg, A., Grove, T., Hogg, D., Tan, T., Worrall,A., Baker, K., 1997. An integrated traffic and pedestrian model-based vision system. Proceedings of the 8th British MachineVision Conference, vol. 2, pp. 380–389.

Smith, S.M., 1998. Asset-2: real-time motion segmentation and objecttracking. Real-Time Imaging 4 (1), 21–40.

Smith, S.M., Brady, J.M., 1995. Asset-2: real-time motion segmen-tation and shape tracking. IEEE Transactions on Pattern Analysisand Machine Intelligence 17 (8), 814–820.