expectation-propagation for the generative aspect modelgenerative aspect model each document mixes...

31
Expectation-Propagation for the Generative Aspect Model Tom Minka and John Lafferty Carnegie Mellon University

Upload: others

Post on 14-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Expectation-Propagation for the Generative Aspect Model

Tom Minka and John Lafferty

Carnegie Mellon University

Page 2: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Main points

• Aspect model is important and diff icult– PCA for discrete data

• How variational methods can fail

• Extensions of Expectation-Propagation

• Using EP for maximum likelihood

Page 3: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Generative aspect model

Each document mixes aspects in different proportions

Aspect 1 Aspect 2

1λ11 λ− 2λ

3λ21 λ− 31 λ−

(Hofmann1999; Blei, Ng, & Jordan 2001)

Page 4: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

First interpretation

Document

Aspect 1 Aspect 2

) word( wp

λ λ−1

),...,(~)( 1 JDirichletp ααλ

Multinomial sampling

Page 5: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Second interpretation

“Aspect 2” “Aspect 2”

Word 2 Word 3Word 1Doc i

3iz2iz1iz“Aspect 1”

Sample fromaspect

Page 6: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Two tasks

Inference:

• Given aspects and document i, what is (posterior for) ?

Learning:

• Given some documents, what are (maximum likelihood) aspects?

Page 7: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Variational method

• Blei, Ng, & Jordan, NIPS’01

• Uses interpretation

• Inference: factored variational distribution

• Learning: – E-step

– M-step (aspects, aspect weights)

)()( zqq λ

),( zλ

),( zλ),( αp

Page 8: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Expectation Propagation

• Uses interpretation

• Inference: EP of Dirichlet posterior

• Learning:– E-step

– M-step (aspects, aspect weights)

• Fewer latent variables better

)only (λ

),( αp

)(λq

)(λ

Page 9: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Geometric interpretation

• Likelihood is composed of terms of form

• Want Dirichlet approximation:

∑==a

na

nnw

www awpwpt ))|(()()( λλ

∏=a

awwat βλλ)(~

Page 10: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Variational method

• Bound each term via Jensen’s inequality:

• Coupled equations:

∏∑ ×≥a

aa

awaawp const)()|( βλλ

)|()logexp( awpawa ><∝ λβ

Page 11: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Moment matching

• Context function: all but one occurrence

• Moment match:

• Fixed point equations for

∏≠

−=ww

nw

nw

w ww ttq'

'1\ ')()()( λλλ

)()(~)()( \\ λλλλ ww

ww qtqt ↔

β

Page 12: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

One term

3.0)1(4.0)()( λλλ −+=wt

Page 13: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Ten word document

Page 14: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Moments for ten word doc

VarianceMean

Variational

EP

Exact

0.01780.365

0.06120.393

0.06130.393

Page 15: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

1000 word document

Page 16: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Moments for 1000 word doc

VarianceMean

Variational

EP

Exact

0.00020.252

0.00150.252

0.00150.250

Page 17: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

General behavior

• For long documents, VB recovers correct mean, but not correct variance

• Optimizes global criterion, only accurate locally

• What about learning?

Page 18: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Learning

• Want MLE for aspects• Requires marginalizing • is probabil ity of generating doc

from a random• Consider extreme approximation:

– Probability of generating from most likely– Ignores uncertainty in (“Occam factor” )– Used by Hofmann (1999)

λ

λ

λ

)(docp

λ

Page 19: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Toy problem

• Two aspects over two words– describes each

• One aspect fixed to

• is uniform, i.e.

)|1( awp =

1)2|1( == awp

)2|1()1()1|1()()1( awpawpwp =−+=== λλλ 1=α

)1( =wp1=λ 0=λ

Aspect1 Aspect2

0 1

Page 20: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Exact likelihood vs. Max

ii nn /1Dotted are observed frequencies:

)1|1( awp =

10 docs

Page 21: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

EP likelihood vs. VB

)1|1( awp =

VB not much different from Max

Page 22: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

100 documents

)1|1( awp =

EP better, VB worse

Page 23: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Longer documents

)1|1( awp =

VB not helped

Page 24: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Why VB fails

• Doesn’ t capture posterior variance of– No Occam factor

• Gets worse with more documents

• Doesn’ t matter if posterior is sharp!– Longer documents

– More distinct aspects

Both sharpen posterior, but do not help VB

λ

Page 25: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Larger vocabulary

10 docs,Length 10

Page 26: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Larger vocabulary

10 docs,Length 10

Page 27: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Larger vocabulary

10 docs,Length 10

Page 28: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

More documents

100 docs,Length 10

Page 29: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

TREC documents

Page 30: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Perplexity

• EP and VB models have nearly same perplexity (!)

• Perplexity is document probabilit y, normalized by length– Dilutes the penalty of extreme aspects

• Better measure: classification error

Page 31: Expectation-Propagation for the Generative Aspect ModelGenerative aspect model Each document mixes aspects in different proportions Aspect 1 Aspect 2 λ1 1−λ1 λ2 1−λ2 λ3 1−λ3

Summary & Future

• Approximate inference can lead to biased learning– Must preserve “Occam factor” terms

• Moment matching does well

• Simpler methods may also work– Area of aspect simplex

• Make visualizations!