spatially coherent latent topic model for concurrent object segmentation and classification
DESCRIPTION
A paper reviewTRANSCRIPT
Spatially coherent latent topic model for concurrent object segmentation and
classification
Authors: Liangliang Cao, Li Fei-FeiPresenter: Shao-Chuan Wang
Outline
• Motivation• A Review on Graphical Models• Today’s topic: the paper• Their Results
Motivation: Real world problem often full of “noises”
• Bags of words (local features)– Spatial relationships of objects
are ignored (has its limit)• When classify a test image,
what is its “subject” ?
Flag?
Banner?
People?
Sports field?
From Prof. Fei-Fei’s ICCV09 tutorial slide
Outline
• Motivation• A Review on Graphical Models• Today’s topic: the paper• Their Results
Generative vs Discriminative
• Generative model: model p(x, y) or p(x|y)p(y)
• Discriminative model: model p(y|x)
0 10 20 30 40 50 60 700
0.5
1
x = data
0 10 20 30 40 50 60 700
0.05
0.1
From Prof. Antonio Torralba course slide
• Naïve Bayesian model – (c: class, w: visual words)
• Once we have learnt the distribution, for a query image
N
nn cwpcpcpcpcp
1
)|()()()|(),( ww
Generative model: An example
)()|(maxarg
)|(maxarg
),(maxarg*
cpcp
cp
cpc
qc
qc
qc
w
w
w
qw
w1 … wn
cBayesianNetworks
Generative model: Another example
• Mixture Gaussian Model
?
How to infer from unlabeled data even if weknow the underlining probability distribution structure?
),|()|()|()(),,,( xx pcpcpcpcp
A graphical model),|()|()|()(),,,( xx PcPcPcPcP
• Directed graph
• Nodes represent variables
• Links show dependencies
• Conditional distributions at each node
Inverse Variance
Observed data
Object class
c
γμ
x
Mean
P(μ|c)
P(c)
P(γ|c)
P(x|μ,γ)
Hidden
Inference of latent variables
• Expectation maximization (EM)– “Soft guess” latent variable first
(E-step)– Based on latent variable
(assume it is correct), solve optimization problem (M-step)
• Markov-chain Monte Carlo (MCMC)– Use Gibbs sampling from the Posterior– Slow to converge
• Variational method/Variational Message Passing (VMP)– Algorithms that convert inference problems into
optimization problems (Opper and Saad 2001; Wainwright and Jordan 2003)
Image from Wikipedia
Outline
• Motivation• A Review on Graphical Models• Today’s topic: the paper• Their Results
Back to the topic: the paper
• Key Ideas:– Latent topics are spatially coherent
• Generate topic distribution at the region level
– Over-segmentation, then merge by same topics
• Avoid obtaining regions larger than the objects• One topic per region• Can recognize objects with occlusion
– Describe a region:• Homogeneous Appearance ar:
average of color or texture features• SIFT-based visual words: wr
– Concurrent segmentation and classification
bag of words
oversegmentation
Spatial Latent Topic Model
• Notation:– Image Id
– Region r = {1,2,…,Rd}
– Latent topic zr = {1,2,…,K}
– appearance ar = {1,2,…,A}
– visual words wr = (wr1,wr
2,…, wrMr); wr
1 = {1,2,…,W}
– P(zr |θd): • topic probability (Multinomial distribution) parameterized by θd
– P(θd|λ): • Dirichlet prior of θd, parameterized by λ
– α, β: • parameters describing the probability of generating appearance and visual
words given topic
Spatial Latent Topic Model (Unsupervised)
• Maximize Log-likelihood– an optimization problem: close-formed solution is
intractable
Dirichletprior
Multinomial
),|(),|()|()|(
),,|,,(
rrrrdrd
rrr
zPzaPzPP
zaP
w
w
)|(),,,,|,(),,,,( HVzβαwzβα PaP drrdd L d
dLL
Variaitional Message Passing (Winn 2005)
• Coupling hidden variables θ, α, β makes the maximization intractable
• Instead, maximize the lower bound of L • Goal: Find a tractable Q(H) that closely
approximates the true posterior distribution P(H|V) (equality holds for any distribution Q)
QHH HQ
VHPHQPQKL
HQ
VHPHQ LL )(
),(ln)()||(
)(
),(ln)(ln
H HQ
VHPHQPQKL
)(
)|(ln)()||(
←Or equivalently, minimize KL(Q||P)
Variaitional Message Passing (Winn 2005)
• Further factorization assumptions (Jordan et al., 1999; Jaakkola, 2001; Parisi, 1988) (restrict the family of distributions Q)
)()( i
ii HQHQ
j*
)(~
Qinnotterms)||(
)()(),(ln)(
)(ln),(ln)(
j
jjj
i
Hjj
jiij
HHQjj
H iii
Hi
iii
QQKL
QQVHPHQ
HQQVHPHQ
HH
L(Q)
Entropy term
=
.const),(ln)(ln)(~
* iHQii VHPHQWhere,
Variaitional Message Passing (Winn 2005)
• Markov blanket:
.cons)pa|(ln)pa|(ln
.const),(ln)(ln
ch)(~)(~j
)(~
*
tXPHP
VHPHQ
jjj
j
kHQkkHQj
HQjj
)pa|()( i
iiXPP X
Eqn. (6) in the paper
Bayesian networks representation
Spatial Latent Topic Model (Supervised)
• For a query image, Id , find its most probable category c:
Now it becomes C x K matrix, i.e. θ depends on observed c
dIr
crrc
aPc )|,(maxarg* w
Process• Training step
– maximize total likelihood of training images, subject λ, α, θ and zr
– The learned λ, α are fixed
• Testing phase, for a query Image Id
– Estimate its θd and zr
– For classification task, find its most probable latent topics as its category
– For segmentation task, for the same zr, merge it.
)|,(maxargˆ rrrz
r zaPzr
w
rd z
drrrrd zPzaP )|()|,(maxargˆ
w
)(maxarg1
* kk dKk
(3)
Outline
• Motivation• A Review on Graphical Models• Today’s topic: the paper• Their Results
Experimental Results
• Unsupervised segmentation
Occlusion case:
Experimental Results
• Supervised segmentation
Dataset13 classes of nature scenes
# of training images: 100# of topics: 60# of categories: 13
Experimental Results
• Supervised classification
Dataset28 classes from Caltech 101
# of training images: 30# of test images: 30# of topics in category: 28# of topics in clutter: 346 background classes are left unlabeled
~ Thank you ~
Variaitional Message Passing
• Following this framework, and use the graphical model provided by this paper:
dx
Xdx
)()(