extracting the object from the shadows: maximum likelihood object/shadow discrimination

Extracting the Object from the Shadows: MaximumLikelihood Object/Shadow Discrimination

Kan Ouivirach and Matthew N. Dailey

Computer Science and Information ManagementAsian Institute of Technology

May 16, 2013

[email protected] ML Object/Shadow Discrimination May 16, 2013 1 / 37

Outline

1 Introduction

2 Proposed Method

3 Experimental Results

4 Conclusion and Future Work


IntroductionOverview

Detecting and tracking moving objects is an important issue.

Background subtraction is a very common approach to detect movingobjects.

One major disadvantage is that shadows tend to be misclassified as part ofthe foreground object.


IntroductionProblems

The size of the detected object could be overestimated.This leads to merging otherwise separate blobs representing differentpeople walking close to each other.This makes isolating and tracking people in a group much moredifficult than necessary.

(a) (b)

Figure : Example problem with shadows. (a) Original image. (b) Foregroundpixels according to background model.


IntroductionWhy Important?

Shadow removal can significantly improve the performance of computervision tasks such as

tracking;

segmentation;

object detection.

Shadow detection has become an active research area in recent years.


IntroductionRelated Work

Chromaticity and luminance are not orthogonal in the RGB color space,but lighting differences can be controlled for in the normalized RGB colorspace.

Mikic et al. (2000) observe that in the normalized RGB color space,shadow pixels tend to be more blue and less red than illuminated pixels.They apply a probabilistic model based on the normalized red and bluefeatures to classify shadow pixels in traffic scenes.

One well-known problem with the normalized RGB space is thatnormalization of pixels with low intensity results in unstable chromaticcomponents (Kender; 1976).



Cucchiara et al. (2001) and Chen et al. (2008) use a HSV color-basedmethod (deterministic nonmodel-based method) to eliminate the concernsbased on the assumption that only intensity of the shadow area willsignificantly change.

SPt(x , y) =

1 if α ≤ IVt (x ,y)

BVt (x ,y)

≤ β∧(I St (x , y)− BS

t (x , y)) ≤ TS

∧∣∣IHt (x , y)− BH

t (x , y)∣∣ ≤ TH

0 otherwise,

where SPt(x , y) is the resulting binary mask for shadows at each pixel(x , y) at time t. IHt , I St , IVt , BH

t , BSt , and BV

t are the H, S, and Vcomponents of foreground pixel It(x , y) and background pixel Bt(x , y) atpixel (x , y) at time t, respectively. They prevent foreground pixels frombeing classified as shadow pixels by setting two thresholds, 0 < α < β < 1.The four thresholds α, β, TS , and TH are empirically determined.



Some researchers have investigated color spaces besides RGB and HSV.

Blauensteiner et al. (2006) use an “improved” hue, luminance, andsaturation (IHLS) color space for shadow detection to deal with the issueof unstable hue at low saturation by modeling the relationship betweenthem.

Another alternative color space is YUV. Some applications such astelevision and videoconferencing use the YUV color space natively, andsince transformation from YUV to HSV is time-consuming,Schreer et al. (2002) operate in the YUV color space directly, developing afast shadow detection algorithm based on approximated changes of hueand saturation in the YUV color space.



Some work uses texture-based methods such as the normalizedcross-correlation (NCC) technique. This method detects shadows based onthe assumption that the intensity of shadows is proportional to theincident light, so shadow pixels should simply be darker than thecorresponding background pixels.

However, the texture-based method tends to misclassify foreground pixelsas shadow pixels when the foreground region has a similar texture to thecorresponding background region.


Outline

1 Introduction

2 Proposed Method




Proposed MethodOverview

We propose a new method for detecting shadows using maximumlikelihood estimation based on color information.

We extend the deterministic nonmodel-based approach to statisticalmodel-based approach.

We estimate the joint distribution over the difference in the HSVcolor space between pixels in the current frame and the correspondingpixels in a background model, conditional on whether the pixel is anobject pixel or a shadow pixel.

We use the maximum likelihood principle, at run time, to classify eachforeground pixel as either shadow or object given the estimated model.


Proposed MethodOverview: Offline and Online Phases

We divide our method into two phases:1 Offline phase:

Construct a background model from the first few frames;Extract foreground on the remaining frames;Manually label the extracted pixels as either object or shadow pixels;Construct a joint probability model over the difference in the HSV colorspace between pixels in the current frame and the corresponding pixelsin the background model, conditional on whether the pixel is an objectpixel or a shadow pixel.

2 Online phase:

Perform the same background modeling and foreground extractionprocedure;Classify foreground pixels as either shadow or object using themaximum likelihood approach.


Proposed MethodProcessing Flow

Our processing flow is as follows.

1 Global Motion Detection

2 Foreground Extraction

3 Maximum Likelihood Classification of Foreground Pixels (Offline andOnline Phases)


Proposed MethodGlobal Motion Detection

We use simple frame-by-frame subtraction.

When the difference image has a number of pixels whose intensitydifferences are above a threshold, we start a new video segment.

No motion frames are removed to decrease processing and storage time.


Proposed MethodForeground Extraction

After discarding the no-motion video segments, we use Poppe et al. (2007)background modeling method to segment foreground pixels from thebackground.

(a) (b)

Figure : Example foreground extraction. (a) Original image. (b) Foregroundpixels according to background model.


Proposed MethodMaximum Likelihood Classification of Foreground Pixels

In the offline phase:

After foreground extraction, we manually label pixels as either shadowor object.

We then observe the distribution over the difference in hue (Hdiff),saturation (Sdiff), and value (Vdiff) components in the HSV colorspace between pixels in the current frame and the correspondingpixels in the background model.


Proposed MethodDistributions over the Differences in HSV Components for True Object Pixels

(a) (b) (c)

Figure : Example distributions over the difference in (a) hue, (b) saturation, and(c) value components for true object pixels, extracted from our hallway dataset.


Proposed MethodDistributions over the Differences in HSV Components for Shadow Pixels

(a) (b) (c)

Figure : Example distributions over the difference in (a) hue, (b) saturation, and(c) value components for shadow pixels, extracted from our hallway dataset.


Proposed MethodMeasurement Likelihood for Shadow Pixels

We define the measurement likelihood for pixel (x , y) given its assignmentas follows.

P(Mxy | Axy = sh) = P(Hdiff | Axy = sh)P(Sdiff | Axy = sh)P(Vdiff | Axy = sh),

where Mxy is a tuple containing the HSV value for pixel (x , y) in thecurrent image as well as the HSV value for pixel (x , y) in the backgroundmodel for pixel (x , y), and Axy is the assignment of pixel (x , y) as objector shadow. “sh” stands for shadow.


Proposed MethodMeasurement Likelihood for Shadow Pixels

To make the problem tractable, we assume that the distributions over thecomponents on the right hand side in the previous equation followGaussian distributions, defined as follows.

P(Hdiff | Axy = sh) = N (Hdiff;µhshdiff, σ2

hshdiff

)

P(Sdiff | Axy = sh) = N (Sdiff;µsshdiff, σ2

sshdiff

)

P(Vdiff | Axy = sh) = N (Vdiff;µv shdiff, σ2

v shdiff

)


Proposed MethodMeasurement Likelihood for Object Pixels

Similarly, the measurement likelihood for object pixels can be computed asfollows.

P(Mxy | Axy = obj) = P(Hdiff | Axy = obj)P(Sdiff | Axy = obj)P(Vdiff | Axy = obj)

Here “obj” stands for object.


Proposed MethodMeasurement Likelihood for Object Pixels

As for the shadow pixel distributions, we assume Gaussian distributionsover the components on the right hand side in the previous equation, asfollows.

P(Hdiff | Axy = obj) = N (Hdiff;µhobj

diff, σ2

hobjdiff

)

P(Sdiff | Axy = obj) = N (Sdiff;µsobj

diff, σ2

sobjdiff

)

P(Vdiff | Axy = obj) = N (Vdiff;µvobj

diff, σ2

vobjdiff

)


Proposed MethodModel Parameters

We estimate the parameters

Θ = {µhshdiff, σ2

hshdiff, µssh

diff, σ2

sshdiff, µv sh

diff, σ2

v shdiff, µ

hobjdiff, σ2

hobjdiff

, µsobj

diff, σ2

sobjdiff

, µvobj

diff, σ2

vobjdiff

}

directly from training data during the offline phase.


Proposed MethodPixel Classification

In the online phase:

Given the model estimate Θ, we use the maximum likelihood approach toclassify a pixel as a shadow pixel if

P(Mxy | Axy = sh; Θ) > P(Mxy | Axy = obj; Θ).

Otherwise, we classify the pixel as an object pixel.

We could add the prior probabilities to the shadow model and the objectmodel in the equation above to obtain a maximum a posteriori classifier.In our experiments, we assume equal priors.


Outline

1 Introduction

2 Proposed Method




Experimental ResultsOverview

We present the experimental results for

1 our proposed maximum likelihood (ML) classification method;

2 the deterministic nonmodel-based (DNM) method;

3 the normalized cross-correlation (NCC) method.


Experimental ResultsVideo Sequences

We performed the experiments on three video sequences. The figure belowshows sample frames from the three video sequences.

(a) (b) (c)

Figure : Sample frames from the (a) Hallway, (b) Laboratory, and (c) Highwayvideo sequences

The Hallway sequence is our own dataset. The Laboratory and Highwaysequences were first introduced in Prati et al. (2003).


Experimental ResultsPerformance Evaluation

We compute the two metrics proposed by Prati et al. (2003), defining theshadow detection rate η and the shadow discrimination rate ξ as follows:

η =TPsh

TPsh + FNsh; ξ =

TPobj

TPobj + FNobj,

where the subscript “sh” and “obj” stand for shadow and object,respectively. TP and FN are the number of true positive (i.e., the shadowor object pixels correctly identified) and false negative (i.e., the shadow orobject pixels classified incorrectly) pixels.


Experimental ResultsMetrics for Evaluation

More information about η and ξ

η: the proportion of shadow pixels correctly detected

ξ: the proportion of object pixels correctly detected.

The η and ξ can also be thought of as the true positive rate (sensitivity)and true negative rate (specificity) for detecting shadows, respectively.

In the experiment, we also compare the methods with the additional twometrics: precision and F1 score.


Experimental ResultsPreparation for the Experiments

Ground truth data:

The Laboratory and Highway video sequences are provided in Sanin etal. (2012).

A standard Gaussian mixture (GMM) background model in used toextract foreground pixels.

For our Hallway video sequence, we used the previously mentionedextended version of the GMM background model for foregroundextraction, but the results were not substantially different from thoseof the standard GMM.


Experimental ResultsFind the Best Parameters for Each Model

We performed five-fold cross validation for each of the three models andeach of the three data sets.

We varied the parameter settings for each method on each video datasetand selected the setting that maximized the F1 score (a measurecombining both precision and recall) over the cross validation test sets.


Experimental ResultsShadow Detection Performance

Table : Comparison of shadow detection results between the proposed, DNM, andNCC methods.



From the table, our proposed method

achieves the top performance for shadow detection rate η and F1

score in every case;

obtains good shadow discrimination rate ξ in every case.

The DNM method has stable performance for all three videos, with goodperformance for all metrics.

Both the DNM method and our proposed method suffer from the problemthat the object colors can be confused with the background color. Weclearly see this situation in the Highway sequence (third row in the nextfigure).

The NCC method achieves the best shadow discrimination rate ξ andprecision because it classifies nearly every pixel as object as can be seen inthe next figure.



Figure : Results for an arbitrary frame in each video sequence. The first columncontains an example original frame for each video sequence. The second columnshows the ground truth for that frame, where object pixels are labeled in whiteand shadow pixels are labeled in gray. The remaining columns show shadowdetection results for each method, where pixels labeled as object shown in greenand pixels labeled as shadow are shown in red.


Outline

1 Introduction

2 Proposed Method




Conclusion and Future WorkConclusion

We propose a new method for detecting shadows using a simplemaximum likelihood approach based on color information;

We extend the deterministic nonmodel-based approach, designing aparametric statistical model-based approach;

Our experimental results show that our proposed method is extremelyeffective and superior to the standard methods on three differentreal-world video surveillance data sets.


Conclusion and Future WorkFuture Work

In some cases, we misdetect shadow pixels due to similar colorbetween the object and the background and unclear backgroundtexture in shadow regions;

Incorporating geometric or shadow region shape priors wouldpotentially improve the detection and discrimination rates;

We plan to address these issues, further explore the feasibility ofcombining our method with other useful shadow features, andintegrate our shadow detection module with a real-world open sourcevideo surveillance system.


extracting the object from the shadows: maximum likelihood object/shadow discrimination

Education

yuv color space

normalized rgb color

y bst x

y ist x

y bht x

shadow removal

y atpixel x

alternative color space