detecting moving shadows: formulation,...

DETECTING MOVING SHADOWS:FORMULATION, ALGORITHMS AND EVALUATION 1

Detecting Moving Shadows:Algorithms and Evaluation

Andrea Prati∗ 1,2, Ivana Mikic2, Mohan M. Trivedi2, Rita Cucchiara1

Abstract—Moving shadows need careful consideration inthe development of robust dynamic scene analysis systems.Moving shadow detection is critical for accurate object de-tection in video streams, since shadow points are often mis-classified as object points causing errors in segmentation andtracking. Many algorithms have been proposed in the litera-ture that deal with shadows. However, a comparative evalu-ation of the existing approaches is still lacking. In this paper,we present a comprehensive survey of moving shadow de-tection approaches. We organize contributions reported inthe literature in four classes, two of them are statistical andtwo are deterministic. We also present a comparative em-pirical evaluation of representative algorithms selected fromthese four classes. Novel quantitative (detection and discrim-ination rate) and qualitative metrics (scene and object inde-pendence, flexibility to shadow situations and robustness tonoise) are proposed to evaluate these classes of algorithms ona benchmark suite of indoor and outdoor video sequences.These video sequences and associated “ground-truth” dataare made available athttp://cvrr.ucsd.edu/aton/shadowto al-low for others in the community to experiment with new al-gorithms and metrics.

Keywords— Shadow detection, performance evaluation,object detection, tracking, segmentation, traffic scene anal-ysis, visual surveillance

I. I NTRODUCTION

DETECTION and tracking of moving objects is atthe core of many applications dealing with im-

age sequences. One of the main challenges in theseapplications is identifying shadows which objectscast and which move along with them in the scene.Shadows cause serious problems while segmentingand extracting moving objects, due to the misclas-sification of shadow points as foreground. Shadowscan cause object merging, object shape distortion andeven object losses (due to the shadow cast over an-

∗ Corresponding author.1 Dipartimento di Ingegneria dell’Informazione, Universita di

Modena e Reggio Emilia, Via Vignolese, 905 - Modena - Italy- phone: +39-059-2056136 - fax: +39-059-2056126 - e-mail:{prati.andrea/cucchiara.rita}@unimo.it

2 Computer Vision and Robotics Research Laboratory, Depart-ment of Electrical and Computer Engineering, University of Cali-fornia, San Diego, 9500 Gilman Drive - La Jolla, CA 92037-0407- USA - phone: (858) 822-0075 - fax: (858) 534-0415 - e-mail:{ivana/trivedi}@ece.ucsd.edu

other object). The difficulties associated with shadowdetection arise since shadows and objects share twoimportant visual features. First, shadow points are de-tectable as foreground points since they typically dif-fer significantly from the background; second, shad-ows have the same motion as the objects castingthem. For this reason, the shadow identification iscritical both for still images and for image sequences(video) and has become an active research area espe-cially in the recent past. It should be noted that whilethe main concepts utilized for shadow analysis in stilland video images are similar, typically the purposebehind shadow extraction is somewhat different. Inthe case of still images, shadows are often analyzedand exploited to infer geometric properties of the ob-jects causing the shadow (“shape from shadow” ap-proaches) as well as to enhance object localizationand measurements. Examples of this can be found inaerial image analysis for recognizing buildings [1][2],for obtaining 3-D reconstruction of the scene [3] oreven for detecting clouds and their shadows [4]. An-other important application domain for shadow detec-tion in still images is for the 3-D analysis of objectsto extract surface orientations [5] and light source di-rection [6].

Shadow analysis, considered in the context ofvideo data, is typically performed for enhancingthe quality of segmentation results instead of de-ducing some imaging or object parameters. Inthe literature shadow detection algorithms are nor-mally associated with techniques for moving ob-ject segmentation. In this paper we present a com-prehensive survey of moving shadow detection ap-proaches. We organize contributions reported inthe literature in four classes and present a com-parative empirical evaluation of representative algo-rithms selected from these four classes. This com-parison takes into account both the advantages andthe drawbacks of each proposal and provides a quan-titative and qualitative evaluation of them. Novelquantitative (detection and discrimination rate) and

2 DETECTING MOVING SHADOWS:FORMULATION, ALGORITHMS AND EVALUATION

qualitative metrics (scene and object independence,flexibility to shadow situations and robustness tonoise) are proposed to evaluate these classes of al-gorithms on a benchmark suite of indoor and out-door video sequences. These video sequences andassociated “ground-truth” data are made availableat http://cvrr.ucsd.edu/aton/shadow toallow for others in the community to experiment withnew algorithms and metrics. This availability fol-lows the idea of data-sharing embodied in Call forComparison, like the project of European COST 211Group (see athttp://www.iva.cs.tut.fi/COST211/ forfurther details).

In the next Section we develop a two layer tax-onomy for surveying various algorithms presented inthe literature. Each approach class is detailed and dis-cussed to emphasize its strengths and its limitations.In Section III, we develop a set of evaluation metricsto compare the shadow detection algorithms. This isfollowed by Section IV where we present a results ofempirical evaluation of four selected algorithms on aset of five video sequences. The final Section presentsconcluding remarks.

II. TAXONOMY OF SHADOW DETECTION

ALGORITHMS

Most of the proposed approaches take into accountthe shadow model described in [7] . To account fortheir differences, we have organized the various al-gorithms in a two-layer taxonomy. The first layerclassification considers whether the decision processintroduces and exploits uncertainty.Deterministicapproachesuse an on/off decision process, whereasstatistical approachesuse probabilistic functions todescribe the class membership. Introducing uncer-tainty to the class membership assignment can re-duce noise sensitivity. In the statistical methods (as[8][9][10][11][12]) the parameter selection is a criti-cal issue. Thus, we further divide the statistical ap-proaches inparametricandnon-parametricmethods.The study reported in [8] is an example of the para-metric approach, whereas [10][11] are examples ofthe non-parametric approach. The deterministic class(see [6][7][13][14]) can be further subdivided. Sub-classification can be based on whether the on/off de-cision can be supported by model based knowledge ornot. Choosing amodel basedapproach achieves un-doubtedly the best results, but is, most of the times,

too complex and time consuming compared to thenon-model based. Moreover, the number and thecomplexity of the models increase rapidly if the aimis to deal with complex and cluttered environmentswith different lighting conditions, object classes andperspective views.

It is also important to recognize the types of “fea-tures” utilized for shadow detection. Basically, thesefeatures are extracted from three domains:spectral,spatialandtemporal. Approaches can exploit differ-ently spectral features, i.e. using gray level or colorinformation. Some approaches improve results by us-ing spatial information working at a region level or ata frame level, instead of pixel level. This is a classifi-cation similar to that used in [15] for the backgroundmaintenance algorithms. Finally, some methods ex-ploit temporal redundancy to integrate and improveresults.

In Table I we have classified 21 papers dealing withshadow detection in four classes. We highlight spec-tral, spatial and temporal features used by these algo-rithms. In this paper, we focus our attention on fouralgorithms (reported in bold in Table I) representativeof three of the above-mentioned classes. For the sta-tistical parametric class we choose the algorithm pro-posed in [8] since this utilizes features from all threedomains. The approach reported in [11] can be con-sidered to be a very good representative of the statisti-cal non-parametric class and is also cited and used in[17]. Within the deterministic non-model based classwe choose to compare the algorithm described in [13]because is the only one that uses HSV color spacefor shadow detection. Finally, algorithm reported in[7] has been selected for its unique capability to copewith penumbra. The deterministic model-based classhas not been considered due to its complexity and dueto its reliance on very specific task domain assump-tions. For instance, the approach used in [14] modelsshadows using a simple illumination model: assum-ing parallel incoming light, they compute the projec-tion of the 3D object model onto the ground, exploit-ing two parameters for the illumination direction setoff-line and assumed to be constant during the entiresequence. However, as stated in the previous Sec-tion, in outdoor scene the projection of the shadow isunlikely to be perspective, since the light source cannot be assumed to be a point light source. Therefore,the need for object models and illumination position’smanual setting make this approach difficult to be im-

PRATI, MIKI C, TRIVEDI AND CUCCHIARA 3

Statistical parametric Statistical non-parametricPaper Spectral Spatial Temporal Paper Spectral Spatial TemporalFriedman and Russell 1997 [12] C L D Horprasert et al. 1999 [11] C L SMiki c et al. 2000 [8][9] C R D Tao et al.4 2000 [16] C F D

McKenna et al. 2000 [17] C L SDeterministic model based Deterministic non-model based

Paper Spectral Spatial Temporal Paper Spectral Spatial Temporal

Irvin and McKeown Jr.1 1989 [1] G L S Scanlan et al.1 1990 [18] G L SWang et al. 1991 [4] G R S Jiang and Ward1 1992 [6] G F SKilger 1992 [19] G R S Charkari and Mori 1993 [20] G R SKoller et al. 1993 [14] G L S Sexton and Zhang 1993 [21] G L SOnoguchi2 1998 [22] G L S Funka-Lea and Bajcsy1 1995 [23] G F D

Sonoda and Ogata 1998 [24] G F STzomakas and von Seelen 1998 [25] G F SAmamoto and Fujii 1999 [26] G N/A3 DStauder et al. 1999 [7] G F DCucchiara et al. 2001 [13] C L S

TABLE I

CLASSIFICATION OF THE LITERATURE ON SHADOW DETECTION. ( G=GREY-LEVEL , C=COLOR, L=LOCAL /PIXEL-LEVEL

R=REGION-LEVEL F=FRAME-LEVEL , S=STATIC, D=DYNAMIC ).

plemented in a general-purpose framework.In the next subsections we describe briefly the se-

lected approaches. For more details, refer to the cor-responding papers or see the detailed description thatwe reported in [27]

A. Statistical non-parametric (SNP) approach

As an example of statistical non-parametric (SNP)approach we choose the one described in [28] anddetailed in [11]. This work considers thecolor con-stancyability of human eyes and exploits the Lamber-tian hypothesis to consider color as a product of irra-diance and reflectance. The distortion of the bright-nessαi and the distortion of the chrominanceCDi

of the difference between expected color of a pixeland its value in the current image are computed andnormalized w.r.t. their root mean square of pixeli.The valuesαi andCDi obtained are used to classify

1This paper considers only still images2This paper is not properly a deterministic model approach. It uses

an innovative approach based oninverse perspective mappingin whichthe assumption is that the shadow and the object that casts it are over-lapped if projected on the ground plane. Since a model of the scene isnecessary, we classify this paper in this class.

3This paper has the unique characteristic to use the DCT to removeshadow. For this reason, we can say that this paper works onfrequency-level. The rationale used by the authors is that a shadow has, in thefrequency domain, a large DC component, whereas the moving objecthas a large AC component.

4Since this paper uses a fuzzy neural network to classify points asbelonging or not to a shadow, it can be considered a statistical approach.However, how much the parameter setting is automated is not clear inthis paper.

a pixel in four categories:

C(i) =

For. : CDi > τCD or αi < ταlo, else

Back. : αi < τα1 and αi > τα2, elseShad. : αi < 0, else

Highl. : otherwise(1)

The rationale used is that shadows have simi-lar chromaticity but lower brightness than the back-ground model. A statistical learning procedureis used to automatically determine the appropriatethresholds.

B. Statistical parametric (SP) approach

The algorithm described in [8] for traffic sceneshadow detection is an example of statistical paramet-ric (SP) approach. This algorithm claims to use twosources of information:local (based on the appear-ance of the pixel) andspatial (based on the assump-tion that the objects and the shadows are compact re-gions). The a-posteriori probabilities of belongingto background, foreground and shadow classes aremaximized. The a-priori probabilities of a pixel be-longing to shadow are computed by assuming thatv = [R,G,B]T is the value of the pixel not shad-owed and by using an approximated linear transfor-mationv = Dv (whereD = diag(dR,dG,dB) is adiagonal matrix obtained by experimental evaluation)to estimate the color of the point covered by a shadow.The D matrix is assumed approximately constantover flat surfaces. If the background is not flat over


the entire image, differentD matrices must be com-puted for each flat subregion. The spatial informationis exploited by performing an iterative probabilisticrelaxation to propagate neighborhood information. Inthis statisticalparametricapproach the main draw-back is the difficult process necessary to select theparameters. Manual segmentation of a certain num-ber of frames has to be done to collect statistics andto compute the values of matrixD. An expectationmaximization (EM) approach could be used to auto-mate this process, as in [12].

C. Deterministic non-model based (DNM1) ap-proach

The system described in [13] is an example of de-terministic non-model based approach (and we calledit DNM1). This algorithm works in the HSV colorspace. The main reasons are that HSV color spacecorresponds closely to the human perception of color[29] and it has revealed more accuracy in distinguish-ing shadows. In fact, a shadow cast on a backgrounddoes not change significantly its hue [30]. Moreover,the authors exploit saturation information since theynote that shadows often lower the saturation of thepoints. The resulting decision process is reported inthe following equation:

SPk(x, y) =

1 if α ≤ IV

k (x,y)

BVk

(x,y)≤ β

∧ (ISk (x, y)−BS

k (x, y)) ≤ τS

∧ |IHk (x, y)−BH

k (x, y)| ≤ τH

0 otherwise

(2)

whereIk(x, y) and Bk(x, y) are the pixel valuesat coordinate(x, y) in the input image (frame k) andin the background model (computed at frame k), re-spectively. The use ofβ prevents the identification asshadows of those points where the background wasslightly changed by noise, whereasα takes into ac-count the “power” of the shadow, i.e. how strong thelight source is w.r.t. the reflectance and irradiance ofthe objects. Thus, stronger and higher the sun (in theoutdoor scenes), the lowerα should be chosen.

D. Deterministic non-model based (DNM2) ap-proach

Finally, we compare the approach presented in[7]. This is also a deterministic non-model based ap-proach, but we have included it because of its com-pleteness (is the only work in the literature that deals

with penumbra in moving cast shadows). The shadowdetection is provided by verifying three criteria: thepresence of a ”darker” uniform region, by assum-ing that the ratio between actual value and referencevalue of a pixel is locally constant in presence ofcast shadows; the presence of a high difference in lu-minance w.r.t reference frame; and the presence ofstatic and moving edges. Static edges hint a staticbackground and can be exploited to detect nonmov-ing regions inside the frame difference. Moreover, todetect penumbra the authors propose to compute thewidth of each edge in the difference image. Sincepenumbra causes a soft luminance step at the con-tour of a shadow, they claim that the edge width isthe more reliable way to distinguish between objectscontours and shadows contours (characterized by awidth greater than a threshold).

This approach is one of the most complete and ro-bust proposed in the literature. Nevertheless, in thiscase the assumptions and the corresponding approxi-mations introduced are strong and they could lack ingenerality. Also, the penumbra criterion is not ex-plicitly exploited toadd penumbra points as shadowpoints, but it is only used toremovethe points thatdo not fit this criterion. Moreover, the proposed algo-rithm uses the previous frame (instead of the back-ground) as reference frame. This choice exhibitssome limitations in moving region detection since itis influenced by object speed and it is too noise sen-sitive. Thus, to make the comparison of these ap-proaches as fair as possible, limited to the shadowdetection part of the system, we implemented theDNM2 approach using a background image as refer-ence, as the other three approaches do.

III. PERFORMANCE EVALUATION METRICS

In this section, the methodology used to comparethe four approaches is presented. In order to sys-tematically evaluate various shadow detectors, it isuseful to identify the following two important qual-ity measures: good detection(low probability tomisclassify a shadow point) andgood discrimina-tion (the probability to classify non-shadow pointsas shadow should be low, i.e. low false alarmrate). The first one corresponds to minimizing thefalse negatives (FN), i.e. the shadow points classi-fied as background/foreground, while for good dis-crimination, thefalse positives (FP), i.e. the fore-ground/background points detected as shadows, are


minimized.A reliable and objective way to evaluate this type

of visual-based detection is still lacking in the lit-erature. A very good work on how to evaluate ob-jectively the segmentation masks in video sequencesis presented in [31]. The authors proposed a met-ric based onspatial accuracyand temporal stabilitythat aims at evaluatinf differently the FPs and FNsdepending on their distance from the borders of themask, and at taking into account the shifting (insta-bility) of the mask along the time. In [32], the au-thors proposed two metrics for moving object de-tection evaluation: theDetection Rate (DR)and theFalse Alarm Rate (FAR). AssumingTP as the num-ber of true positives(i.e. the shadow points correctlyidentified), these two metrics are defined as follows:

DR =TP

TP + FN; FAR =

FP

TP + FP(3)

The Detection Rate is often calledtrue positive rateor alsorecall in the classification literature and theFAR corresponds to1 − p, wherep is the so calledprecision in the classification theory. These figuresare not selective enough for shadow detection eval-uation, since they do not take into account whethera point detected as shadow belongs to a foregroundobject or to the background. If shadow detection isused to improve moving object detection, only thefirst case is problematic, since false positives belong-ing to the background do not affect neither the objectdetection nor the object shape.

To account this, we have modified the metrics ofequation 3, defining theshadow detection rateη andtheshadow discrimination rateξ as follows:

η =TPS

TPS + FNS

; ξ =TPF

TPF + FNF

(4)

where the subscript S stays for shadow and F forforeground. TheTPF is the number of ground-truthpoints of the foreground objects minus the numberof points detected as shadows but belonging to fore-ground objects.

In addition to the above quantitative metrics, wealso consider the following qualitative measures inour evaluation: robustness to noise, flexibility toshadow strength, width and shape, object indepen-dence, scene independence, computational load, de-tection of indirect cast shadows and penumbra. Indi-rect cast shadows are the shadows cast by a moving

object over another moving object and their effect isto decrease the intensity of the moving object cov-ered, probably affecting the object detection, but notthe shadow detection.

IV. EMPIRICAL COMPARATIVE EVALUATION

In this section, the experimental results and thequantitative and qualitative comparison of the fourapproaches are presented. First, a set of sequencesto test the algorithms was chosen to form a completeand non trivial benchmark suite. We select the se-quences reported in Table II, where both indoor andoutdoor sequences are present, where shadows rangefrom dark and small to light and large and where theobject type, size and speed vary considerably. TheHighway Iand theHighway II sequences show a traf-fic environment (at two different lighting conditions)where the shadow suppression is very important toavoid misclassification and erroneous counting of ve-hicles on the road. TheCampussequence is a noisysequence from outdoor campus site where cars ap-proach to an entrance barrier and students are walk-ing around. The two indoor sequences report two lab-oratory rooms in two different perspectives and light-ing conditions. In theLaboratorysequence, besideswalking people, a chair is moved in order to detect itsshadow.

A. Quantitative comparison

To compute the evaluation metrics described inSection III, the ground-truth for each frame is nec-essary. We obtained it by segmenting the imageswith an accurate manual classification of points inforeground, background and shadow regions. Weprepared ground truth on tens of frames for eachvideo sequence representative of different situations(dark/light objects, multiple objects or single object,occlusions or not).

All the four approaches but the DNM2 have beenfaithfully and completely implemented. In the caseof DNM2 some simplifications have been introduced:the memory MEM used in [7] to avoid infinite errorpropagation in the change detection masks (CDMs)has not been implemented since computationally veryheavy and not necessary (in the sequences consideredthere is no error propagation); some minor tricks (likethat of the closure of small edge fragments) have notbeen included due to the lack of details in the paper.However, these missing parts of the algorithm do not


Highway I Highway II Campus Laboratory Intelligentroom

Sequence type outdoor outdoor outdoor indoor indoorSequencelength

1074 1134 1179 987 900

Image size 320x240 320x240 352x288 320x240 320x240Shadowstrength

medium high low very low low

Shadow size large small very large medium largeObject class vehicles vehicles vehicle/people people/other peopleObject size large small medium medium mediumObject speed(pixels)

30-35 8-15 5-10 10-15 2-5

Noise level medium medium high low medium

TABLE II

THE SEQUENCE BENCHMARK USED.

Highway I Highway II Campus Laboratory Intelligent Roomη% ξ% η% ξ% η% ξ% η% ξ% η% ξ%

SNP 81.59% 63.76% 51.20% 78.92% 80.58% 69.37% 84.03% 92.35% 72.82% 88.90%SP 59.59% 84.70% 46.93% 91.49% 72.43% 74.08% 64.85% 95.39% 76.27% 90.74%

DNM1 69.72% 76.93% 54.07% 78.93% 82.87% 86.65% 76.26% 89.87% 78.61% 90.29%DNM2 75.49% 62.38% 60.24% 72.50% 69.10% 62.96% 60.34% 81.57% 62.00% 93.89%

TABLE III

EXPERIMENTAL RESULTS.

influence shadow detection at all. In conclusion, thecomparison has been set up as fair as possible.

Results are reported in Table III. To establish afair comparison, algorithms do not implement anybackground updating process (since each tested al-gorithm proposes a different approach). Insteadwe compute the reference image and other parame-ters from the firstN frames (withN varying withthe sequence considered). The firstN frames canbe considered as the training set and the remain-ing frames as the testing set for our experimen-tal framework. Note that the calculated parame-ters remain constant for the whole sequence. Thevisual results on a subset of theIntelligent Roomand of theHighway I sequences are available athttp://cvrr.ucsd.edu/aton/shadow . Fig.1 shows an example of visual results from the indoor

sequenceIntelligent Room.The SNP algorithm is very effective in most of

the cases, but with very variable performances. Itachieves the best detection performanceη and highdiscrimination rateξ in the indoor sequenceLabora-tory, with percentages up to 92%. However, the dis-crimination rate is quite low in theHighway I andCampussequences. This can be explained by thedark (similar to shadows) appearance of objects in theHighway I sequence and by the strong noise compo-nent in theCampussequence.

The SP approach achieves good discrimination ratein most of the cases. Nevertheless, its detection rate isrelatively poor in all the cases but theIntelligent roomsequence. This is mainly due to the approximationof constantD matrix on the entire image. Since thebackground can be rarely assumed as flat on the en-


(a) Raw image (b) SNP result (c) SP result

(d) DNM1 result (e) DNM2 result

Fig. 1. Results of in theIntelligent roomsequence. Red pixelsidentify foreground points and blue pixels indicate shadowpoints.

tire image, this approach lacks in generality. Never-theless, good accuracy in the case ofIntelligent roomtest shows how this approach can deal with indoor se-quences where the assumption of constantD matrixis valid.

The DNM1 algorithm is the one with the most sta-ble performance, even with totally different video se-quences. It achieves good accuracy in almost all thesequences. It outperforms the other algorithms in theCampusand in theIntelligent roomsequences.

The DNM2 algorithm suffers mainly due to theassumption of planar background. This assumptionfails in the case of theLaboratory sequence wherethe shadows are cast both on the floor and on the cab-inet. The low detection performance in theCampussequence is mainly due to noise and this algorithmhas proven low robustness to strong noise. Finally,this algorithm achieves the worst discrimination re-sult in all the cases but theIntelligent roomsequence.This is due to its assumption of textured objects: ifthe object appearance is not textured (or seems nottextured due to the distance and the quality of theacquisition system), the probability that parts of theobject are classified as shadow arises. In fact, in theIntelligent roomsequence the clothes of the person inthe scene are textured and the discrimination rate ishigher. This approach outperforms the others in themore difficult sequence (Highway II).

The statistical approaches perform robustly innoisy data, due to statistical modeling of noise. On

the other hand, deterministic approaches (in particu-lar if pixel-based and almost unconstrained as DNM1)exhibit a good flexibility to different situations. Dif-ficult sequences, likeHighway II, require, however,a more specialized and complete approach to achievegood accuracy. To help evaluating the approaches theresults on theHighway I outdoor sequence and onthe Intelligent roomindoor sequence are available athttp://cvrr.ucsd.edu/aton/shadow .

B. Qualitative comparison

To evaluate the behaviour of the four algorithmswith respect to the qualitative issues presented in Sec-tion III, we vote them ranging from “very low” to“very high” (see Table IV). The DNM1 method isthe most robust to noise, thanks to its pre- and post-processing algorithms [13]. The capacity to dealwith different shadow size and strength is high inboth the SNP and the DNM1. However, the higherflexibility is achieved by the DNM2 algorithm whichis able to detect even the penumbra in an effectiveway. Nevertheless, this algorithm is very object-dependent, in the sense that, as already stated, theassumption on textured objects affects strongly theresults. Also, the two frame difference approach pro-posed in [7] is weak as soon as the object speeds in-crease. The hypothesis of a planar background makesthe DNM2 and especially the SP approaches morescene-dependent than the other two. Although we cannot claim to have implemented these algorithms in themost efficient way, the DNM2 seems the more timeconsuming, due to the amount of processing neces-sary. On the other hand, the SNP is very fast.

Finally, we evaluated the behaviour of the algo-rithms in the presence of indirect cast shadows (seeSection III). The DNM2 approach is able to detectboth the penumbra and the indirect cast shadow in avery effective way. The SP and the DNM1 methodsfailed in detecting indirect cast shadows. The pixel-based decision can not distinguish correctly betweenthis type of moving shadows and those shadows caston the background. However, the SP approach is ableto detect relatively narrow penumbra.

V. CONCLUDING REMARKS

Development of practical dynamic scene analy-sis systems for real-world applications needs carefulconsideration of the moving shadows. Research com-munity has recognized this and serious, substantive


Robustness Flexibility Object Scene Computational Indirect shadowto noise to shadow independence independence load & penumbra detection

SNP high high high high very low highSP high medium high low low low

DNM1 very high high high high low very lowDNM2 low very high low medium high very high

TABLE IV

QUALITATIVE EVALUATION .

efforts in this area are being reported. The main mo-tivator for this paper is to provide a general frame-work to discuss such contributions in the field andalso to provide a systematic empirical evaluation of aselected representative class of shadow detection al-gorithms. Papers dealing with shadows are classifiedin a two-layer taxonomy and four representative al-gorithms are described in detail. A set of novel quan-titative and qualitative metrics has been adopted toevaluate the approaches.

Main conclusion of the empirical study can be de-scribed as follows. For a general-purpose shadowdetection system, with minimal assumptions, apixel based deterministic non-model based approach(DNM1) assures best results. On the other hand, todetect shadows efficientlyin one specific environ-ment, more assumptions yield better results andde-terministic model-based approachshould be applied.In this situation, ifthe object classes are numerousto allow modeling of every class, acomplete de-terministic approach, like the DNM2, should be se-lected. If the environment is indoor, the statisticalapproachesare the more reliable, since the scene isconstant and a statistical description is very effective.If there are different planes onto which the shadowscan be cast, an approach like SNP is the best choice.If the shadows are scattered, narrow, or particularly“blended” to the environment, a region-based dy-namic approach, typically deterministic, is the bestchoice (as DNM2 in theHighway II scene reported inthis paper). Finally, ifthe scene is noisy, a statisti-cal approach or a deterministic approach with effec-tive pre- and post-processing stepsshould be used.Finally, we want to remark that all the evaluated ap-proaches exploit a large set of assumptions, to limitcomplexity and to avoid being undully constrainedto a specific scene model. This limits their shadowdetection accuracies. This in fact points to the lim-itations of using only image-derived information inshadow detection. Further improvements would re-

quire feedback of specific task/scene domain knowl-edge.

A very interesting future direction has been sug-gested by an unknown reviewer. He/she suggested toconsider thephysically importantindependent vari-ables to evaluate the algorithms. If we can con-sider as parameters of the scene for example the typeof illumination for indoor scene or the surface typeupon which the shadows are cast in outdoor environ-ments, we can build up a benchmark on which test-ing the different approaches. Results on accuracy onthis benchmark would be more useful to future re-seracher/developer of shadow detection (and motiondetection) algorithm since morephysically linkedtothe considered scene.

ACKNOWLEDGMENTS

Our research is supported in part by the CaliforniaDigital Media Innovation Program (DiMI) in partner-ship with the California Department of Transporta-tion (Caltrans), Sony Electronics, and Compaq Com-puters. We wish to thank our collaborators from theCaltrans (TCFI) in Santa Barbara for their supportand interactions. We extend special thanks to ourcolleagues in the CVRR Laboratory for their effortsin acquiring the video data sets utilized in our stud-ies. We also wish to thank all the reviewers for theirthoughtful comments that help us to improve the pa-per.

REFERENCES

[1] R.B. Irvin and D.M. McKeown Jr., “Methods for exploiting therelationship between buildings and their shadows in aerial im-agery,” IEEE Transactions on Systems, Man, and Cybernetics,vol. 19, pp. 1564–1575, 1989.

[2] Y. Liow and T. Pavlidis, “Use of shadows for extracting build-ings in aerial images,”Computer Vision, Graphics and ImageProcessing, vol. 49, no. 2, pp. 242–277, Feb. 1990.

[3] G. Medioni, “Obtaining 3-D from shadows in aerial images,” inProceedings of IEEE Int’l Conference on Computer Vision andPattern Recognition, 1983, pp. 73–76.

[4] C. Wang, L. Huang, and A. Rosenfeld, “Detecting clouds andcloud shadows on aerial photographs,”Pattern Recognition Let-ters, vol. 12, no. 1, pp. 55–64, Jan. 1991.


[5] S.A. Shafer and T. Kanade, “Using shadows in finding surfaceorientations,”Computer Vision, Graphics and Image Processing,vol. 22, no. 1, pp. 145–176, Apr. 1983.

[6] C. Jiang and M.O. Ward, “Shadow identification,”Proceedingsof IEEE Int’l Conference on Computer Vision and Pattern Recog-nition, pp. 606–612, 1992.

[7] J. Stauder, R. Mech, and J. Ostermann, “Detection of movingcast shadows for object segmentation,”IEEE Transactions onMultimedia, vol. 1, no. 1, pp. 65–76, Mar. 1999.

[8] I. Mikic, P. Cosman, G. Kogut, and M.M. Trivedi, “Movingshadow and object detection in traffic scenes,” inProceedingsof Int’l Conference on Pattern Recognition, Sept. 2000, vol. 1,pp. 321–324.

[9] M.M. Trivedi, I. Mikic, and G. Kogut, “Distributed video net-works for incident detection and management,” inProceedingsof IEEE Int’l Conference on Intelligent Transportation Systems,Oct. 2000, pp. 155–160.

[10] A. Elgammal, D. Harwood, and L.S. Davis, “Non-parametricmodel for background subtraction,” inProceedings of IEEEICCV’99 FRAME-RATE Workshop, 1999.

[11] T. Horprasert, D. Harwood, and L.S. Davis, “A statistical ap-proach for real-time robust background subtraction and shadowdetection,” in Proceedings of IEEE ICCV’99 FRAME-RATEWorkshop, 1999.

[12] N. Friedman and S. Russell, “Image segmentation in video se-quences: a probabilistic approach,” inProceedings of the 13thConference on Uncertainty in Artificial Intelligence, 1997.

[13] R. Cucchiara, C. Grana, G. Neri, M. Piccardi, and A. Prati, “TheSakbot system for moving object detection and tracking,” inVideo-based Surveillance Systems - Computer Vision and Dis-tributed Processing. 2001, pp. 145–157, Kluwer Academic.

[14] D. Koller, K. Daniilidis, and H.H. Nagel, “Model-based objecttracking in monocular image sequences of road traffic scenes,”International Journal of Computer Vision, vol. 10, pp. 257–281,1993.

[15] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower:Principles and Practice of Background Maintenance,” inProceed-ings of IEEE Int’l Conference on Computer Vision, 1999, vol. 1,pp. 255–261.

[16] X. Tao, M. Guo, and B. Zhang, “A neural network approach tothe elimination of road shadow for outdoor mobile robot,” inProceedings of IEEE Int’l Conference on Intelligent ProcessingSystems, 1997, vol. 2, pp. 1302–1306.

[17] S.J. McKenna, S. Jabri, Z. Duric, A. Rosenfeld, and H. Wech-sler, “Tracking groups of people,”Computer Vision and ImageUnderstanding, vol. 80, no. 1, pp. 42–56, Oct. 2000.

[18] J.M. Scanlan, D.M. Chabries, and R.W. Christiansen, “A shadowdetection and removal algorithm for 2-D images,” inProceedingsof Int’l Conference on Acoustics, Speech and Signal Processing,1990, vol. 4, pp. 2057–2060.

[19] M. Kilger, “A shadow handler in a video-based real-time trafficmonitoring system,”Proceedings of IEEE Workshop on Applica-tions of Computer Vision, pp. 11–18, 1992.

[20] N.M. Charkari and H. Mori, “A new approach for real time mov-ing vehicle detection,” inProceeding of IEEE/RSJ Int’l Confer-ence on Intelligent Robots and Systems, 1993, pp. 273–278.

[21] G.G. Sexton and X. Zhang, “Suppression of shadows for im-proved object discrimination,” inProc. IEE Colloq. Image Pro-cessing for Transport Applications, Dec. 1993, pp. 9/1–9/6.

[22] K. Onoguchi, “Shadow elimination method for moving objectdetection,” inProceedings of Int’l Conference on Pattern Recog-nition, 1998, vol. 1, pp. 583–587.

[23] G. Funka-Lea and R. Bajcsy, “Combining color and geometryfor the active, visual recognition of shadows,” inProceedings ofIEEE Int’l Conference on Computer Vision, 1995, pp. 203–209.

[24] Y. Sonoda and T. Ogata, “Separation of moving objects and theirshadows, and application to tracking of loci in the monitoring im-ages,” inProceedings of Int’l Conference on Signal Processing,1998, pp. 1261–1264.

[25] C. Tzomakas and W. von Seelen, “Vehicle detection in traf-fic scenes using shadows,” Tech. Rep. 98-06, IR-INI, Institutfur Nueroinformatik, Ruhr-Universitat Bochum, FRG, Germany,Aug. 1998.

[26] N. Amamoto and A. Fujii, “Detecting obstructions and trackingmoving objects by image processing technique,”Electronics andCommunications in Japan, Part 3, vol. 82, no. 11, pp. 28–37,1999.

[27] A. Prati, R. Cucchiara, I. Mikic, and M.M. Trivedi, “Analysis andDetection of Shadows in Video Streams: A Comparative Evalu-ation,” in Third Workshop on Empirical Evaluation Methods inComputer Vision - IEEE Int’l Conference on Computer Vision andPattern Recognition, 2001.

[28] I. Haritaoglu, D. Harwood, and L.S. Davis, “W4: real-timesurveillance of people and their activities,”IEEE Transactionson Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp.809–830, Aug. 2000.

[29] N. Herodotou, K.N. Plataniotis, and A.N. Venetsanopoulos, “Acolor segmentation scheme for object-based video coding,” inProceedings of the IEEE Symposium on Advances in Digital Fil-tering and Signal Processing, 1998, pp. 25–29.

[30] R. Cucchiara, C. Grana, M. Piccardi, A. Prati, and S. Sirotti,“Improving shadow suppression in moving object detection withHSV color information,” inProceedings of IEEE Int’l Conferenceon Intelligent Transportation Systems, Aug. 2001, pp. 334–339.

[31] P. Villegas, X. Marichal, and A. Salcedo, “Objective Evaluationof Segmentation Masks in Video Sequences,” inWIAMIS ’99workshop, May 1999, pp. 85–88.

[32] G. Medioni, “Detecting and tracking moving objects for videosurveillance,” inProceedings of IEEE Int’l Conference on Com-puter Vision and Pattern Recognition, 1999, vol. 2, pp. 319–325.

detecting moving shadows: formulation,...

Documents