generative learning methods for bags of features model the probability of a bag of features given a...

Generative learning methods for bags of features

• Model the probability of a bag of features given a class

Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Generative methods

• We will cover two models, both inspired by text document analysis:• Naïve Bayes• Probabilistic Latent Semantic Analysis

The Naïve Bayes modelThe Naïve Bayes model

Csurka et al. 2004

• Assume that each feature is conditionally independent given the class

fi: ith feature in the imageN: number of features in the image

N

iiN cfpcffp

11 )|()|,,(


Csurka et al. 2004

M

j

wnj

N

iiN

jcwpcfpcffp1

)(

11 )|()|()|,,(


wj: jth visual word in the vocabularyM: size of visual vocabularyn(wj): number of features of type wj in the image

fi: ith feature in the imageN: number of features in the image


Csurka et al. 2004


No. of features of type wj in training images of class c

Total no. of features in training images of class c

p(wj | c) =

M

j

wnj

N

iiN

jcwpcfpcffp1

)(

11 )|()|()|,,(


Csurka et al. 2004


No. of features of type wj in training images of class c + 1

Total no. of features in training images of class c + M

p(wj | c) =

(Laplace smoothing to avoid zero counts)

M

j

wnj

N

iiN

jcwpcfpcffp1

)(

11 )|()|()|,,(

M

jjjc

M

j

wnjc

cwpwncp

cwpcpc j

1

1

)(

)|(log)()(logmaxarg

)|()(maxarg*


Csurka et al. 2004

• Maximum A Posteriori decision:

(you should compute the log of the likelihood instead of the likelihood itself in order to avoid underflow)


Csurka et al. 2004

wN

c

• “Graphical model”:

Probabilistic Latent Semantic AnalysisProbabilistic Latent Semantic Analysis

T. Hofmann, Probabilistic Latent Semantic Analysis, UAI 1999

zebra grass treeImage

= p1 + p2 + p3

“visual topics”


• Unsupervised technique• Two-level generative model: a document is a

mixture of topics, and each topic has its own characteristic word distribution

wd z


document topic wordP(z|d) P(w|z)


• Unsupervised technique• Two-level generative model: a document is a

mixture of topics, and each topic has its own characteristic word distribution

wd z


K

kjkkiji dzpzwpdwp

1

)|()|()|(

The pLSA modelThe pLSA model

K

kjkkiji dzpzwpdwp

1

)|()|()|(

Probability of word i in document j

(known)

Probability of word i given

topic k (unknown)

Probability oftopic k givendocument j(unknown)

The pLSA modelThe pLSA model

Observed codeword distributions

(M×N)

Codeword distributionsper topic (class)

(M×K)

Class distributionsper image

(K×N)

K

kjkkiji dzpzwpdwp

1

)|()|()|(

p(wi|dj) p(wi|zk)

p(zk|dj)

documents

wor

ds

wor

ds

topics

topi

cs

documents

=

Maximize likelihood of data:

Observed counts of word i in document j

M … number of codewords

N … number of images

Slide credit: Josef Sivic

Learning pLSA parametersLearning pLSA parameters

InferenceInference

)|(maxarg dzpzz

• Finding the most likely topic (class) for an image:

InferenceInference

)|(maxarg dzpzz

• Finding the most likely topic (class) for an image:

zzz dzpzwp

dzpzwpdwzpz

)|()|(

)|()|(maxarg),|(maxarg

• Finding the most likely topic (class) for a visual word in a given image:

Topic discovery in images

J. Sivic, B. Russell, A. Efros, A. Zisserman, B. Freeman, Discovering Objects and their Location in Images, ICCV 2005

Application of pLSA: Action recognition

Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.

Space-time interest points

Application of pLSA: Action recognition

Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.

pLSA model

– wi = spatial-temporal word

– dj = video

– n(wi, dj) = co-occurrence table (# of occurrences of word wi in video dj)

– z = topic, corresponding to an action

K

kjkkiji dzpzwpdwp

1

)|()|()|(

Probability of word i in video j(known)

Probability of word i given

topic k (unknown)

Probability oftopic k given

video j(unknown)

Action recognition example

Multiple Actions

Summary: Generative models

• Naïve Bayes• Unigram models in document analysis• Assumes conditional independence of words given class• Parameter estimation: frequency counting

• Probabilistic Latent Semantic Analysis• Unsupervised technique• Each document is a mixture of topics (image is a mixture of

classes)• Can be thought of as matrix decomposition• Parameter estimation: Expectation-Maximization

generative learning methods for bags of features model the probability of a bag of features given a...

Documents

image slide

counts slide

underflow slide

class w j

document j unknown slide

visual topics slide

given image

nave bayes model csurka