spatially coherent latent topic model for concurrent object segmentation and classification

Spatially coherent latent topic model for concurrent object segmentation and

classification

Authors: Liangliang Cao, Li Fei-FeiPresenter: Shao-Chuan Wang

Outline

• Motivation• A Review on Graphical Models• Today’s topic: the paper• Their Results

Motivation: Real world problem often full of “noises”

• Bags of words (local features)– Spatial relationships of objects

are ignored (has its limit)• When classify a test image,

what is its “subject” ?

Flag?

Banner?

People?

Sports field?

From Prof. Fei-Fei’s ICCV09 tutorial slide

Outline


Generative vs Discriminative

• Generative model: model p(x, y) or p(x|y)p(y)

• Discriminative model: model p(y|x)

0 10 20 30 40 50 60 700

0.5

1

x = data

0 10 20 30 40 50 60 700

0.05

0.1

From Prof. Antonio Torralba course slide

• Naïve Bayesian model – (c: class, w: visual words)

• Once we have learnt the distribution, for a query image

N

nn cwpcpcpcpcp

1

)|()()()|(),( ww

Generative model: An example

)()|(maxarg

)|(maxarg

),(maxarg*

cpcp

cp

cpc

qc

qc

qc

w

w

w

qw

w1 … wn

cBayesianNetworks

Generative model: Another example

• Mixture Gaussian Model

?

How to infer from unlabeled data even if weknow the underlining probability distribution structure?

),|()|()|()(),,,( xx pcpcpcpcp

A graphical model),|()|()|()(),,,( xx PcPcPcPcP

• Directed graph

• Nodes represent variables

• Links show dependencies

• Conditional distributions at each node

Inverse Variance

Observed data

Object class

c

γμ

x

Mean

P(μ|c)

P(c)

P(γ|c)

P(x|μ,γ)

Hidden

Inference of latent variables

• Expectation maximization (EM)– “Soft guess” latent variable first

(E-step)– Based on latent variable

(assume it is correct), solve optimization problem (M-step)

• Markov-chain Monte Carlo (MCMC)– Use Gibbs sampling from the Posterior– Slow to converge

• Variational method/Variational Message Passing (VMP)– Algorithms that convert inference problems into

optimization problems (Opper and Saad 2001; Wainwright and Jordan 2003)

Image from Wikipedia

Outline


Back to the topic: the paper

• Key Ideas:– Latent topics are spatially coherent

• Generate topic distribution at the region level

– Over-segmentation, then merge by same topics

• Avoid obtaining regions larger than the objects• One topic per region• Can recognize objects with occlusion

– Describe a region:• Homogeneous Appearance ar:

average of color or texture features• SIFT-based visual words: wr

– Concurrent segmentation and classification

bag of words

oversegmentation

Spatial Latent Topic Model

• Notation:– Image Id

– Region r = {1,2,…,Rd}

– Latent topic zr = {1,2,…,K}

– appearance ar = {1,2,…,A}

– visual words wr = (wr1,wr

2,…, wrMr); wr

1 = {1,2,…,W}

– P(zr |θd): • topic probability (Multinomial distribution) parameterized by θd

– P(θd|λ): • Dirichlet prior of θd, parameterized by λ

– α, β: • parameters describing the probability of generating appearance and visual

words given topic

Spatial Latent Topic Model (Unsupervised)

• Maximize Log-likelihood– an optimization problem: close-formed solution is

intractable

Dirichletprior

Multinomial

),|(),|()|()|(

),,|,,(

rrrrdrd

rrr

zPzaPzPP

zaP

w

w

)|(),,,,|,(),,,,( HVzβαwzβα PaP drrdd L d

dLL

Variaitional Message Passing (Winn 2005)

• Coupling hidden variables θ, α, β makes the maximization intractable

• Instead, maximize the lower bound of L • Goal: Find a tractable Q(H) that closely

approximates the true posterior distribution P(H|V) (equality holds for any distribution Q)

QHH HQ

VHPHQPQKL

HQ

VHPHQ LL )(

),(ln)()||(

)(

),(ln)(ln

H HQ

VHPHQPQKL

)(

)|(ln)()||(

←Or equivalently, minimize KL(Q||P)


• Further factorization assumptions (Jordan et al., 1999; Jaakkola, 2001; Parisi, 1988) (restrict the family of distributions Q)

)()( i

ii HQHQ

j*

)(~

Qinnotterms)||(

)()(),(ln)(

)(ln),(ln)(

j

jjj

i

Hjj

jiij

HHQjj

H iii

Hi

iii

QQKL

QQVHPHQ

HQQVHPHQ

HH

L(Q)

Entropy term

=

.const),(ln)(ln)(~

* iHQii VHPHQWhere,


• Markov blanket:

.cons)pa|(ln)pa|(ln

.const),(ln)(ln

ch)(~)(~j

)(~

*

tXPHP

VHPHQ

jjj

j

kHQkkHQj

HQjj

)pa|()( i

iiXPP X

Eqn. (6) in the paper

Bayesian networks representation

Spatial Latent Topic Model (Supervised)

• For a query image, Id , find its most probable category c:

Now it becomes C x K matrix, i.e. θ depends on observed c

dIr

crrc

aPc )|,(maxarg* w

Process• Training step

– maximize total likelihood of training images, subject λ, α, θ and zr

– The learned λ, α are fixed

• Testing phase, for a query Image Id

– Estimate its θd and zr

– For classification task, find its most probable latent topics as its category

– For segmentation task, for the same zr, merge it.

)|,(maxargˆ rrrz

r zaPzr

w

rd z

drrrrd zPzaP )|()|,(maxargˆ

w

)(maxarg1

* kk dKk

(3)

Outline


Experimental Results

• Unsupervised segmentation

Occlusion case:


• Supervised segmentation

Dataset13 classes of nature scenes

# of training images: 100# of topics: 60# of categories: 13


• Supervised classification

Dataset28 classes from Caltech 101

# of training images: 30# of test images: 30# of topics in category: 28# of topics in clutter: 346 background classes are left unlabeled

~ Thank you ~

Variaitional Message Passing

• Following this framework, and use the graphical model provided by this paper:

dx

Xdx

)()(

spatially coherent latent topic model for concurrent object segmentation and classification

Education

model px

model pyx0

pxypydiscriminative

query imagegenerative

coherent latent topic

nave bayesian model

graphical modelstodays

outlinemotivationa review