talk 2011-buet-perception-event
TRANSCRIPT
Human Visual Perception Inspired
Background Subtraction
Mahfuzul Haque and Manzur Murshed
Research Goal
… Stage N Stage 2 Stage 1
Intelligent Video Surveillance
Automated Alert
Smart Monitoring
Context-aware Environments
Event Detection
Action / Activity Recognition
Behaviour Recognition
Behaviour Profiling
Video Stream Analytics Real-time Processing
Real-time Video Analytics
Unexpected Behaviors
Mob violence
Unusual Crowding
Sudden group
formation/deformation
Shooting
Public panic
Increasing number of surveillance cameras
Deployment of large number of surveillance cameras in recent years
Modern airports now have several thousands cameras!!
Dependability on human monitors has increased.
Reliability on surveillance system has decreased.
Decreasing reliability
Are we really protected?
Surveillance cameras
Te
Typical Video Analytics Framework
Surveillance
video stream
Foreground
Objects
Classified
Foreground Blobs
Tracked
trajectories
Event/
Behaviour models High level
description of
unusual events/
actions
1.
Background
Subtraction
2.
Feature Extraction,
Foreground Blob
Classification
3.
Tracking,
Occlusion
Handling
4.
Event/Behavior
Recognition Alarm!
Background Subtraction
Input
Output
Background Subtraction: How?
• Not a practical approach
• Illumination variation
• Local background motion
• Camera displacement
• Shadow and reflection
Challenges with BBS
Current frame
Background
Model
Foreground Blob
Dynamic Background Modelling
Basic Background Subtraction (BBS)
- =
Current frame Background Foreground Blob
Te
MOG-based Background Subtraction
Sky
Cloud
Leaf
Moving Person
Road
Shadow
Moving Car
Floor
Shadow
Walking People
P(x)
x µ
σ2
P(x)
x µ
σ2
P(x)
x µ
σ2
P(x) Sky
Cloud
Person
Leaf
x (Pixel intensity)
Te
MOG-based Background Subtraction
road shadow car road shadow
Frame 1 Frame N
Current frame Detected object
Background
Model
ω1
σ12
µ1
road
ω2
σ22
µ2
shadow
ω3
σ32
µ3
car
65% 20% 15%
Models are ordered by ω/σ
Typical Surveillance Setup
Background
Subtraction Feature
Extraction Event
Detection
Video Stream Frame-size reduction
Frame-rate reduction
Parameter tuning based on operating environment
Scenario 1
Test Sequence: PETS2001_D1TeC2
T = 0.4 T = 0.6 T = 0.8
α = 0.1
α = 0.01
α = 0.001
First Frame
Test Frame
Ground Truth
α = Learning rate
T = Background data proportion
Scenario 2
T = 0.4 T = 0.6 T = 0.8
α = 0.1
α = 0.01
α = 0.001
First Frame
Test Frame
Ground Truth
Test Sequence: VSSN06_camera1
α = Learning rate
T = Background data proportion
Scenario 3
Test Sequence: CAVIAR_EnterExitCrossingPaths2cor
T = 0.4 T = 0.6 T = 0.8
α = 0.1
α = 0.01
α = 0.001
First Frame
Test Frame
Ground Truth
α = Learning rate
T = Background data proportion
Observations
• Slow learning rate (α) is not preferable (ghost
or black-out).
• Simple post-processing will not improve the
detection quality at fast learning rate (α).
• Need to know the context behaviour in
advance.
How can we detect abnormal situations?
“Hey, a mob will be approaching soon,
and background will be visible only 10%
of that duration. Please set T = 0.1”
Te
Research Goals
• A new background subtraction technique for
unconstrained environments, i.e., no context
related information
• Operational at fast learning rate (α)
• Acceptable detection quality
• High stability across changing operating
environments
The New Technique, PMOG
• Perceptual Mixture of Gaussians
• Incorporating perceptual characteristics of
human visual system (HVS) in statistical
background subtraction
– Realistic background value prediction
– Perception based detection threshold
– Perceptual model similarity measure
Te
Realistic Background Value Prediction
ω1
σ12
µ1
road
ω2
σ22
µ2
shadow
ω3
σ32
µ3
car
65% 20% 15%
Models are ordered by ω/σ
P(x)
x x
μ
P(x)
x x
b New!
Most recent observation, b
Te
Realistic Background Value Prediction
P(x)
x x
b
Higher agility than using mean
Not tied with the learning rate
Realistic: actual intensity value
No artificial value due to mean
…
time
μ
b
…
Most recent observation, b
μ = (1-α)μ + αXt
Te
Realistic Background Value Prediction
ω1
σ12
µ1
b1
road
ω2
σ22
µ2
b2
shadow
ω3
σ32
µ3
b3
car
65% 20% 15%
Models are ordered by ω/σ
P(x)
x x
b
P(x)
x x
b
P(x)
x x
b
Te
Perception Based Detection Threshold
P(x)
x x x = c1σ
μ
P(x)
x x
b
x = ?
ω1
σ12
µ1
b1
road
ω2
σ22
µ2
b2
shadow
ω3
σ32
µ3
b3
car
65% 20% 15%
Models are ordered by ω/σ
Te
Our Problem: How is x related with b?
x = ?
P(x)
x x
b
Low x High x
Te
Weber’s Law
Ernst Weber, an experimental psychologist in the
19th century, observed that the just-noticeable
increment ΔI is linearly proportional to the
background intensity I.
ΔI = c2I
How human visual system perceives noticeable intensity
deviation from the background?
Te
Weber’s Law
Ernst Weber, an experimental psychologist in the
19th century, observed that the just-noticeable
increment ΔI is linearly proportional to the
background intensity I.
P(x)
x x
b ? x b
ΔI = c2I
Another perceptual characteristic of HVS
Method 1
Method 2 Reference
Image
Distorted
Images
p dB
q dB |p – q| < 0.5 dB
Not perceivable
by human visual
system
What is the perceptual tolerance level in distinguishing
distorted intensity measures?
Te
x = c2b x = ?
P(x)
x x
b
Our Problem: How is x related with b?
Weber’s Law
Perceptual Threshold, TP (0.5 dB)
1255
10log20255
10log20
xbxb2TP
Te
Impact of Perceptual Threshold, TP
Human Vision: Tp = 0.5 dB
Machine Vision: Tp = 1.0 dB (minimal impact of shadow, reflection, noise etc.)
Liner Relationship
Te
Error Sensitivity in Darker Background
Te
Rod and Cone Cells of Human Eye
Te
• Rods and Cones are two different types of
photoreceptor cells in the retina of human eye
• Rods
– Operate in less intense light
– Responsible for scotopic vision (night vision)
• Cones
– Operate in relatively bright light
– Responsible for photopic (color vision)
Te
Piece-wise Liner Relationship
Scotopic Vision (R) Photopic Vision (C)
Perceptual Model Similarity in PMOG
Model redundancy in MOG
Te
Perceptual Model Similarity in PMOG
Total 50 test sequences from 8 different sources
Scenario distribution Indoor
Outdoor
Multimodal
Shadow and Reflection
Low background-foreground contrast
Test Sequences
Evaluation Qualitative and quantitative
Lee (PAMI, 2005)
Stauffer and Grimson (PAMI, 2000)
False Positive (FP)
False Negative (FN)
False Classification
Experiments
Te
Test Sequences
PETS (9) Wallflower (7) UCF (7) IBM (11) CAVIAR (7) VSSN06 (7) Other (2)
Te
Experiments
Te
Experiments
Te
Experiments
Experiments
Experiments
Experiments
Experiments
Experiments
Experiments
PMOG: Summary
Te
• Realistic background value prediction: high model
agility and superior detection quality at fast learning
rate
• No context related information: high stability across
changing scenarios
• Perception based detection threshold: superior
detection quality in terms of shadow, noise, and
reflection
• Perceptual model similarity: optimal number of models
throughout the system life cycle
• Parameter-less background subtraction: ideal for real-
time video analytics
Panic-driven Event Detection
Te
Event Detection
Specific types of events vs. abnormality
An event persists for a certain duration of time
The duration is variable
Event characteristics of the same event
Variable in the same environment
Variable from one scene to other
time
How to identify the generic
characteristics of an event?
The Proposed Event Detection Approach
Background
Subtraction
Selective
Frame-level
Feature Extraction
Selective
Temporal
Feature ExtractionIncoming frames Foreground blobs
Trained
Event Models
Detection
Results
Background
Subtraction
Frame-level
Feature Extraction
(30 features)
Temporal
Feature Extraction
(270 features)Labelled frames Foreground blobs
Feature Ranking
and Selection
Event Model
Training
Model Training
Real-time Execution
Event
Models
Foreground
Detector
Frame-level
Feature Extractor
Temporal
Feature Extractor
Architecture
The Proposed Event Detection Approach
Event detection as temporal data classification problem
A distinct set of temporal features can characterise an event
Which/how frame-level features are extracted?
How the observed frame-level features are transformed in
temporal-features?
time
f1
f2
f3
.
.
.
fn
Event
Model
Frame-level
Features
Temporal
Features Classifier
The Proposed Event Detection Approach
Key points detection
Point matching in successive frames
Flow vectors: position, direction, speed
Motion based approaches Tracking based approaches
Object detection
Object matching in successive frames
Trajectories: object paths
Inter-frame association
Context specific information
Event models are not generic
Common characteristics
Foreground blob detection
Global frame-level descriptor based on
blob statistical analysis, independent
of scene characteristics
Proposed approach No Inter-frame association
Independent frame-level features =>
temporal features considering speed
and temporal order
Hu et al. (ICPR 2008) Xiang et al. (IJCV 2006)
Te
The Proposed Event Detection Approach
Background subtraction for foreground blob detection
Independent frame-level features extracted using blob
statistical analysis; no object / position specific information,
no spatial association
Frame-level features are transformed into temporal features
considering speed and temporal order
Supposed to be more context invariant
time
f1
f2
f3
.
.
.
fn
Event
Model
Frame-level
Features
Temporal
Features Classifier
Summary
Blob Statistical Analysis
Frame-level features
Blob Area (BA)
Filling Ratio (FR)
Aspect Ratio (AR)
Bounding Box Area (BBA)
Bounding box Width (BBW)
Bounding box Height (BBH)
Blob Count (BC)
Blob Distance (BD)
Frame #
1
2
3
4
5
6
Blob Statistical Analysis
Temporal features
Overlapping sliding window
Temporal order
Speed of variation
Blob Statistical Analysis
Blob Count (BC), Blob Area (BA)
Blob Statistical Analysis
Blob Distance (BD)
Blob Statistical Analysis
Aspect Ratio (AR)
Blob Statistical Analysis
Top five features for four different events
Feature ranking using absolute value criteria of two sample t-test, based on
pooled variance estimate.
Experimental Results
Specific Event Detection
• Four different events: meet, split, runaway, and fight
• CAVIAR dataset with labelled frames
• 80% of the test frames for model training
• 100 iterations of 10-fold cross validation
• Remaining 20% of the test frames for testing
• SVM classifier as event models
• Separate model for each event
Experimental Results
Experimental Results
Specific Event Detection
Predicted
Actual
Severity
Experimental Results
Abnormal Event Detection
• University of Minnesota crowd dataset (UMN dataset)
• The Runaway event model
• No additional training or tuning
• Three different sites
Experimental Results
Abnormal Event Detection (UMN-9)
Experimental Results
Abnormal Event Detection (UMN-10)
Experimental Results
Abnormal Event Detection (UMN-01)
Experimental Results
Abnormal Event Detection (UMN-07)
Experimental Results
Method AUC
Proposed Method 0.89
Pure Optical Flow [1] 0.84
Performance Comparison
[1] R. Mehran, A. Oyama, and M. Shah, “Abnormal crowd behavior detection using social force model,” in Proc. IEEE
Conference on Computer Vision and Pattern Recognition CVPR 2009, 20–25 June 2009, pp. 935–942.
URLs of the images used in this presentation
• http://www.fotosearch.com/DGV464/766029/
• http://www.cyprus-trader.com/images/alert.gif
• http://security.polito.it/~lioy/img/einstein8ci.jpg
• http://www.dtsc.ca.gov/PollutionPrevention/images/question.jpg
• http://www.unmikonline.org/civpol/photos/thematic/violence/streetvio2.jpg
• http://www.airports-worldwide.com/img/uk/heathrow00.jpg
• http://www.highprogrammer.com/alan/gaming/cons/trips/genconindy2003/exhibit-hall-crowd-2.jpg
• http://www.bhopal.org/fcunited/archives/fcu-crowd.jpg
• http://img.dailymail.co.uk/i/pix/2006/08/passaPA_450x300.jpg
• http://www.defenestrator.org/drp/files/surveillance-cameras-400.jpg
• http://www.cityofsound.com/photos/centre_poin/crowd.jpg
• http://www.hindu.com/2007/08/31/images/2007083156401501.jpg
• http://paulaoffutt.com/pics/images/crowd-surfing.jpg
• http://msnbcmedia1.msn.com/j/msnbc/Components/Photos/070225/070225_surveillance_hmed.hmedium.jpg
• http://www.inkycircus.com/photos/uncategorized/2007/04/25/eye.jpg
Thanks!