talk 2011-buet-perception-event

Human Visual Perception Inspired

Background Subtraction

Mahfuzul Haque and Manzur Murshed

Research Goal

… Stage N Stage 2 Stage 1

Intelligent Video Surveillance

Automated Alert

Smart Monitoring

Context-aware Environments

Event Detection

Action / Activity Recognition

Behaviour Recognition

Behaviour Profiling

Video Stream Analytics Real-time Processing

Real-time Video Analytics

Unexpected Behaviors

Mob violence

Unusual Crowding

Sudden group

formation/deformation

Shooting

Public panic

Increasing number of surveillance cameras

Deployment of large number of surveillance cameras in recent years

Modern airports now have several thousands cameras!!

Dependability on human monitors has increased.

Reliability on surveillance system has decreased.

Decreasing reliability

Are we really protected?

Surveillance cameras

Te

Typical Video Analytics Framework

Surveillance

video stream

Foreground

Objects

Classified

Foreground Blobs

Tracked

trajectories

Event/

Behaviour models High level

description of

unusual events/

actions

1.

Background

Subtraction

2.

Feature Extraction,

Foreground Blob

Classification

3.

Tracking,

Occlusion

Handling

4.

Event/Behavior

Recognition Alarm!

Background Subtraction

Input

Output

Background Subtraction: How?

• Not a practical approach

• Illumination variation

• Local background motion

• Camera displacement

• Shadow and reflection

Challenges with BBS

Current frame

Background

Model

Foreground Blob

Dynamic Background Modelling

Basic Background Subtraction (BBS)

- =

Current frame Background Foreground Blob

Te

MOG-based Background Subtraction

Sky

Cloud

Leaf

Moving Person

Road

Shadow

Moving Car

Floor

Shadow

Walking People

P(x)

x µ

σ2

P(x)

x µ

σ2

P(x)

x µ

σ2

P(x) Sky

Cloud

Person

Leaf

x (Pixel intensity)

Te

MOG-based Background Subtraction

road shadow car road shadow

Frame 1 Frame N

Current frame Detected object

Background

Model

ω1

σ12

µ1

road

ω2

σ22

µ2

shadow

ω3

σ32

µ3

car

65% 20% 15%

Models are ordered by ω/σ

Typical Surveillance Setup

Background

Subtraction Feature

Extraction Event

Detection

Video Stream Frame-size reduction

Frame-rate reduction

Parameter tuning based on operating environment

Scenario 1

Test Sequence: PETS2001_D1TeC2

T = 0.4 T = 0.6 T = 0.8

α = 0.1

α = 0.01

α = 0.001

First Frame

Test Frame

Ground Truth

α = Learning rate

T = Background data proportion

Scenario 2

T = 0.4 T = 0.6 T = 0.8

α = 0.1

α = 0.01

α = 0.001

First Frame

Test Frame

Ground Truth

Test Sequence: VSSN06_camera1

α = Learning rate


Scenario 3

Test Sequence: CAVIAR_EnterExitCrossingPaths2cor

T = 0.4 T = 0.6 T = 0.8

α = 0.1

α = 0.01

α = 0.001

First Frame

Test Frame

Ground Truth

α = Learning rate


Observations

• Slow learning rate (α) is not preferable (ghost

or black-out).

• Simple post-processing will not improve the

detection quality at fast learning rate (α).

• Need to know the context behaviour in

advance.

How can we detect abnormal situations?

“Hey, a mob will be approaching soon,

and background will be visible only 10%

of that duration. Please set T = 0.1”

Te

Research Goals

• A new background subtraction technique for

unconstrained environments, i.e., no context

related information

• Operational at fast learning rate (α)

• Acceptable detection quality

• High stability across changing operating

environments

The New Technique, PMOG

• Perceptual Mixture of Gaussians

• Incorporating perceptual characteristics of

human visual system (HVS) in statistical

background subtraction

– Realistic background value prediction

– Perception based detection threshold

– Perceptual model similarity measure

Te

Realistic Background Value Prediction

ω1

σ12

µ1

road

ω2

σ22

µ2

shadow

ω3

σ32

µ3

car

65% 20% 15%


P(x)

x x

μ

P(x)

x x

b New!

Most recent observation, b

Te


P(x)

x x

b

Higher agility than using mean

Not tied with the learning rate

Realistic: actual intensity value

No artificial value due to mean

…

time

μ

b

…

Most recent observation, b

μ = (1-α)μ + αXt

Te


ω1

σ12

µ1

b1

road

ω2

σ22

µ2

b2

shadow

ω3

σ32

µ3

b3

car

65% 20% 15%


P(x)

x x

b

P(x)

x x

b

P(x)

x x

b

Te

Perception Based Detection Threshold

P(x)

x x x = c1σ

μ

P(x)

x x

b

x = ?

ω1

σ12

µ1

b1

road

ω2

σ22

µ2

b2

shadow

ω3

σ32

µ3

b3

car

65% 20% 15%


Te

Our Problem: How is x related with b?

x = ?

P(x)

x x

b

Low x High x

Te

Weber’s Law

Ernst Weber, an experimental psychologist in the

19th century, observed that the just-noticeable

increment ΔI is linearly proportional to the

background intensity I.

ΔI = c2I

How human visual system perceives noticeable intensity

deviation from the background?

Te

Weber’s Law

Ernst Weber, an experimental psychologist in the

19th century, observed that the just-noticeable

increment ΔI is linearly proportional to the

background intensity I.

P(x)

x x

b ? x b

ΔI = c2I

Another perceptual characteristic of HVS

Method 1

Method 2 Reference

Image

Distorted

Images

p dB

q dB |p – q| < 0.5 dB

Not perceivable

by human visual

system

What is the perceptual tolerance level in distinguishing

distorted intensity measures?

Te

x = c2b x = ?

P(x)

x x

b

Our Problem: How is x related with b?

Weber’s Law

Perceptual Threshold, TP (0.5 dB)

1255

10log20255

10log20

xbxb2TP

Te

Impact of Perceptual Threshold, TP

Human Vision: Tp = 0.5 dB

Machine Vision: Tp = 1.0 dB (minimal impact of shadow, reflection, noise etc.)

Liner Relationship

Te

Error Sensitivity in Darker Background

Te

Rod and Cone Cells of Human Eye

Te

• Rods and Cones are two different types of

photoreceptor cells in the retina of human eye

• Rods

– Operate in less intense light

– Responsible for scotopic vision (night vision)

• Cones

– Operate in relatively bright light

– Responsible for photopic (color vision)

Te

Piece-wise Liner Relationship

Scotopic Vision (R) Photopic Vision (C)

Perceptual Model Similarity in PMOG

Model redundancy in MOG

Te

Perceptual Model Similarity in PMOG

Total 50 test sequences from 8 different sources

Scenario distribution Indoor

Outdoor

Multimodal

Shadow and Reflection

Low background-foreground contrast

Test Sequences

Evaluation Qualitative and quantitative

Lee (PAMI, 2005)

Stauffer and Grimson (PAMI, 2000)

False Positive (FP)

False Negative (FN)

False Classification

Experiments

Te

Test Sequences

PETS (9) Wallflower (7) UCF (7) IBM (11) CAVIAR (7) VSSN06 (7) Other (2)

Te

Experiments

Experiments

PMOG: Summary

Te

• Realistic background value prediction: high model

agility and superior detection quality at fast learning

rate

• No context related information: high stability across

changing scenarios

• Perception based detection threshold: superior

detection quality in terms of shadow, noise, and

reflection

• Perceptual model similarity: optimal number of models

throughout the system life cycle

• Parameter-less background subtraction: ideal for real-

time video analytics

Panic-driven Event Detection

Te

Event Detection

Specific types of events vs. abnormality

An event persists for a certain duration of time

The duration is variable

Event characteristics of the same event

Variable in the same environment

Variable from one scene to other

time

How to identify the generic

characteristics of an event?

The Proposed Event Detection Approach

Background

Subtraction

Selective

Frame-level

Feature Extraction

Selective

Temporal

Feature ExtractionIncoming frames Foreground blobs

Trained

Event Models

Detection

Results

Background

Subtraction

Frame-level

Feature Extraction

(30 features)

Temporal

Feature Extraction

(270 features)Labelled frames Foreground blobs

Feature Ranking

and Selection

Event Model

Training

Model Training

Real-time Execution

Event

Models

Foreground

Detector

Frame-level

Feature Extractor

Temporal

Feature Extractor

Architecture


Event detection as temporal data classification problem

A distinct set of temporal features can characterise an event

Which/how frame-level features are extracted?

How the observed frame-level features are transformed in

temporal-features?

time

f1

f2

f3

.

.

.

fn

Event

Model

Frame-level

Features

Temporal

Features Classifier


Key points detection

Point matching in successive frames

Flow vectors: position, direction, speed

Motion based approaches Tracking based approaches

Object detection

Object matching in successive frames

Trajectories: object paths

Inter-frame association

Context specific information

Event models are not generic

Common characteristics

Foreground blob detection

Global frame-level descriptor based on

blob statistical analysis, independent

of scene characteristics

Proposed approach No Inter-frame association

Independent frame-level features =>

temporal features considering speed

and temporal order

Hu et al. (ICPR 2008) Xiang et al. (IJCV 2006)

Te


Background subtraction for foreground blob detection

Independent frame-level features extracted using blob

statistical analysis; no object / position specific information,

no spatial association

Frame-level features are transformed into temporal features

considering speed and temporal order

Supposed to be more context invariant

time

f1

f2

f3

.

.

.

fn

Event

Model

Frame-level

Features

Temporal

Features Classifier

Summary

Blob Statistical Analysis

Frame-level features

Blob Area (BA)

Filling Ratio (FR)

Aspect Ratio (AR)

Bounding Box Area (BBA)

Bounding box Width (BBW)

Bounding box Height (BBH)

Blob Count (BC)

Blob Distance (BD)

Frame #

1

2

3

4

5

6


Temporal features

Overlapping sliding window

Temporal order

Speed of variation


Blob Count (BC), Blob Area (BA)


Blob Distance (BD)


Aspect Ratio (AR)


Top five features for four different events

Feature ranking using absolute value criteria of two sample t-test, based on

pooled variance estimate.

Experimental Results

Specific Event Detection

• Four different events: meet, split, runaway, and fight

• CAVIAR dataset with labelled frames

• 80% of the test frames for model training

• 100 iterations of 10-fold cross validation

• Remaining 20% of the test frames for testing

• SVM classifier as event models

• Separate model for each event


Specific Event Detection

Predicted

Actual

Severity


Abnormal Event Detection

• University of Minnesota crowd dataset (UMN dataset)

• The Runaway event model

• No additional training or tuning

• Three different sites


Abnormal Event Detection (UMN-9)


Method AUC

Proposed Method 0.89

Pure Optical Flow [1] 0.84

Performance Comparison

[1] R. Mehran, A. Oyama, and M. Shah, “Abnormal crowd behavior detection using social force model,” in Proc. IEEE

Conference on Computer Vision and Pattern Recognition CVPR 2009, 20–25 June 2009, pp. 935–942.

URLs of the images used in this presentation

• http://www.fotosearch.com/DGV464/766029/

• http://www.cyprus-trader.com/images/alert.gif

• http://security.polito.it/~lioy/img/einstein8ci.jpg

• http://www.dtsc.ca.gov/PollutionPrevention/images/question.jpg

• http://www.unmikonline.org/civpol/photos/thematic/violence/streetvio2.jpg

• http://www.airports-worldwide.com/img/uk/heathrow00.jpg

• http://www.highprogrammer.com/alan/gaming/cons/trips/genconindy2003/exhibit-hall-crowd-2.jpg

• http://www.bhopal.org/fcunited/archives/fcu-crowd.jpg

• http://img.dailymail.co.uk/i/pix/2006/08/passaPA_450x300.jpg

• http://www.defenestrator.org/drp/files/surveillance-cameras-400.jpg

• http://www.cityofsound.com/photos/centre_poin/crowd.jpg

• http://www.hindu.com/2007/08/31/images/2007083156401501.jpg

• http://paulaoffutt.com/pics/images/crowd-surfing.jpg

• http://msnbcmedia1.msn.com/j/msnbc/Components/Photos/070225/070225_surveillance_hmed.hmedium.jpg

• http://www.inkycircus.com/photos/uncategorized/2007/04/25/eye.jpg

Q&A [email protected]

Thanks!

talk 2011-buet-perception-event

Technology