human detection under partial occlusions using markov logic networks
Post on 10-Feb-2016
34 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
Human Detection under Partial Occlusions using Markov Logic Networks
Raghuraman Gopalan and William Schwartz
Center for Automation ResearchUniversity of Maryland, College Park
2
Human Detection
3
Human DetectionHolistic window-based:•Dalal and Triggs CVPR (2005)•Tuzel et al CVPR (2007)
Part-based:•Wu and Nevatia ICCV (2005)•Mikolajczyk et al ECCV (2004)
Scene-related cues:•Torralba et al IJCV (2006)
4
The occlusion challenge
* Probability of presence of a human obtained from Schwartz et al ICCV (2009)
Body parts occluded by objects Person occluded by another person
0.0059* 0.0816
0.14520.1272
5
Related work
Bilattice-based logical reasoning: Shet et al CVPR (2007)
Integrating probability of human parts using first-order logic (FOL): Schwartz et al ICB (2009)
6
Our approach: Motivation
A data-driven, part-based method1. Probabilistic logical inference using
Markov logic networks (MLN) [Domingos et al, Machine Learning (2006)]
2. Representing `semantic context’ between the detection probabilities of parts.
Within-window, and between-windows With and without occlusions
7
Our approach: An overviewMultiple detection windows
Part detector’s outputs
Face detector outputs
Instantiation of the MLN
Inference
Final Result
Queries:
- person(d1)?- occluded(d1)?- occludedby(d1,d2)?
Learning contextual
rules
8
Main questions
How to integrate detector’s outputs to detect people under occlusion? Enforce consistency according to spatial
location of detectors → removal of false alarms.
Exploit relations between persons to solve inconsistencies → explain occlusions.
Both using MLN, which combines FOL and graphical models in a single representation → avoids contradictions.
9
Our approach: An overviewMultiple detection windows
Part detector’s outputs
Face detector outputs
Instantiation of the MLN
Inference
Final Result
Queries:
- person(d1)?- occluded(d1)?- occludedby(d1,d2)?
Learning contextual
rules
10
Part-based detectors To handle human detection under occlusion, our original detector is split
into parts, then MLN is used to integrate their outputs.
original
top
torso
legs
top-torso
torso-legs
top-legs
11
Detector – An overview
Exploit the use of more representative features to provide richer set of descriptors to improve detection results – edges, textures, and color.
Consequences of the feature augmentation: extremely high dimensional feature space (>170,000) number of samples in the training dataset is smaller
than the dimensionality
These characteristics prevent the use of classical machine learning such as SVM, but make an ideal setting for Partial Least Squares (PLS)*.
* H. Wold, Partial Least Squares, Encyclopedia of statistical sciences, 6:581-591 (1985)
13
Detection using PLS
fUqy
ETPXT
T
T, U are (n x h) matrices of h extracted latent vectors. P (p x h) and q (1 x h) represent the matrices loadings and E (n x p) and f (n x 1) are the residuals of X and Y, respectively.
PLS method NIPALS (nonlinear iterative partial least squares) finds the set of weight vectors W(p x h) ={w1,w2,….wh} such that
PLS models relations between predictors variables in matrix X (n x p) and response variables in vector y (n x 1), where n denotes number of samples, p the number of features.
2
1||
2 )],[cov(max)],[cov( yXwut iwiii
14
Our approach: An overviewMultiple detection windows
Part detector’s outputs
Face detector outputs
Instantiation of the MLN
Inference
Final Result
Queries:
- person(d1)?- occluded(d1)?- occludedby(d1,d2)?
Learning contextual
rules
15
Context: Consistency between the detector outputs
topTorso(d1) ^ top(d1) ^ torso(d1) → person(d1) (consistent)
topTorso(d1) ^ (¬top(d1) v ¬torso(d1)) → ¬person(d1) (false alarm)First order logic rules:
Each detector acts in a specific region of the body. One can look at the output of sensors acting in the same spatial location to check for consistency – similar responses are expected.
Example:
top-torso top torso
Given that top-torso detector outputs high probability, top and torso detectors need to output high probability as well since they intersect the region covered by top-torso.
16
Context: Understanding relationship between different windows
d1
d2
intersect(d1,d2) ^ person(d1) ^ matching(d1,d2) →
person(d2) ^ occluded(d2) ^ occludedby(d2,d1)
First order logic rule:
matching(d1,d2) is true if:
- Detectors at visible parts of d2 have high response.
- detectors at occluded parts of d2 have low response while sensors located at the corresponding positions of d1 have high response.
Low response given by a detector might be caused by a second detection window (a person may be occluding another and causing low response of the detectors).
- d1, and d2 are persons- d1 and d2 intersect
17
Our approach: An overviewMultiple detection windows
Part detector’s outputs
Face detector outputs
Instantiation of the MLN
Inference
Final Result
Queries:
- person(d1)?- occluded(d1)?- occludedby(d1,d2)?
Learning contextual
rules Fi
18
3. Inference using MLN* - The basic idea A logical knowledge base (KB) is a set of
hard constraints (Fi) on the set of possible worlds
Let’s make them soft constraints:When a world violates a formula,It becomes less probable, not impossible
Give each formula a weight (wi)(Higher weight Stronger constraint) satisfiesit formulas of weightsexpP(world)
Contents of the next three slides are partially adapted from Markov Logic Networks tutorial by Domingos et al, ICML (2007)
19
MLN – At a Glance
Logical language: First-order logic Probabilistic language: Markov networks
Syntax: First-order formulas with weights Semantics: Templates for Markov net features
Learning: Parameters: Generative or discriminative Structure: ILP with arbitrary clauses and MAP score
Inference: MAP: Weighted satisfiability Marginal: MCMC with moves proposed by SAT
solver Partial grounding + Lazy inference / Lifted inference
20
MLN- Definition
A Markov Logic Network (MLN) is a set of pairs (Fi, wi) where Fi is a formula in first-order logic wi is a real number
21
Example: Humans & Occlusions
patterns.y probabilitdetector out thereason toseir windowbetween thcontext analyze occlude, persons When two2)
parts. of presence implieshuman a of Presence 1)
24
Example: Humans & Occlusions
1.15.1
Two constants: Detection window 1 (D1) and Detection window 2 (D2)
)()(),(,)()(
yHumanxHumanyxOcclusionyxxPartsxHumanx
D1
D2
25
Example: Humans & Occlusions
1.15.1
Parts(D1)
Human(D1) Human(D2)
Parts(D2)
Two constants: Detection window 1 (D1) and Detection window 2 (D2)
)()(),(,)()(
yHumanxHumanyxOcclusionyxxPartsxHumanx
One node for each grounding of each predicate in the MLN
26
Example: Humans & Occlusions
1.15.1
Parts(D1)
Human(D1)Occlusion(D1,D1)
Occlusion(D2,D1)
Human(D2)
Occlusion(D1,D2)
Parts(D2)
Occlusion(D2,D2)
Two constants: Detection window 1 (D1) and Detection window 2 (D2)
)()(),(,)()(
yHumanxHumanyxOcclusionyxxPartsxHumanx
27
Example: Humans & Occlusions
1.15.1
Parts(D1)
Human(D1)Occlusion(D1,D1)
Occlusion(D2,D1)
Human(D2)
Occlusion(D1,D2)
Parts(D2)
Occlusion(D2,D2)
Two constants: Detection window 1 (D1) and Detection window 2 (D2)
)()(),(,)()(
yHumanxHumanyxOcclusionyxxPartsxHumanx
One feature for each grounding of each formula Fi in the MLN, with the corresponding weight wi
28
Example: Humans & Occlusions
1.15.1
Parts(D1)
Human(D1)Occlusion(D1,D1)
Occlusion(D2,D1)
Human(D2)
Occlusion(D1,D2)
Parts(D2)
Occlusion(D2,D2)
Two constants: Detection window 1 (D1) and Detection window 2 (D2)
)()(),(,)()(
yHumanxHumanyxOcclusionyxxPartsxHumanx
29
Example: Humans & Occlusions
1.15.1
Parts(D1)
Human(D1)Occlusion(D1,D1)
Occlusion(D2,D1)
Human(D2)
Occlusion(D1,D2)
Parts(D2)
Occlusion(D2,D2)
Two constants: Detection window 1 (D1) and Detection window 2 (D2)
)()(),(,)()(
yHumanxHumanyxOcclusionyxxPartsxHumanx
30
Instantiation MLN is template for ground Markov nets Probability of a world x:
Learning of weights, and inference performed using the open-source Alchemy system [Domingos et al (2006)]
Weight of formula Fi No. of true groundings of formula Fi
iii xnw
ZxP )(exp1)(
31
Our approach: An overviewMultiple detection windows
Part detector’s outputs
Face detector outputs
Instantiation of the MLN
Inference
Final Result
Queries:
- person(d1)?- occluded(d1)?- occludedby(d1,d2)?
Learning contextual
rules
Results
Results
Results
35
Comparisons
Dataset details: •200 images•5 to 15 humans per image•Occluded humans ~ 35%
36
Comparisons
38
Conclusions
A data-driven approach to detect humans under occlusions
Modeling semantic context of detector probabilities across spatial locations
Probabilistic contextual inference using Markov logic networks
Question of interest: Integrating analytical models for occlusions and context with this data-driven method
39
Questions ?
top related