exploiting context in topic models - semantic scholar · framework for topic models ... dmr (mimno...

Latent Topic Networks:

A Versatile Probabilistic Programming Framework for Topic Models

Jack Baskin School of Engineering University of California, Santa Cruz

James Foulds Shachi Kumar Lise Getoor

Probabilistic latent variable modeling

2

Data

Complicated, noisy, high-dimensional


3

Understand, explore, predict

Data



4


Data


Latent variable model


5


Data


Low-dimensional, semantically meaningful representations


Topic models

• Topic models are foundational building blocks for powerful latent variable models

– Authorship (Rosen-Zvi et al., 2004)

– Conversational Influence (Nguyen et al., 2014)

– Knowledge base construction

(Movshovitz-Attias and Cohen, 2015)

– Machine translation (Mimno et al., 2009)

– Political analysis (Grimmer, 2010), (Gerrish and Blei, 2011, 2012)

– Recommender systems (Wang and Blei, 2011), (Diao et al., 2014)

– Scientific impact (Dietz et al. 2007), (Foulds and Smyth, 2013)

– Social network analysis (Chang et al., 2009)

– Word-sense disambiguation (Boyd-Graber et al., 2007)

– …

6

Custom topic models

• Custom latent variable topic models useful for

data mining and computational social science

• The challenge is scalability

7

Custom topic models




8

Custom topic models




9

Sparse, stochastic, collapsed, distributed algorithms, …

Custom topic models




10

Sparse, stochastic, collapsed, distributed algorithms, …

Max Welling

There’s no end to speeding up LDA!

Custom topic models



• The bottleneck is human effort and expertise

11

Design time >> run time

Custom topic models

12


Data




Custom topic models

13


Data




Custom topic models

14


Data



Latent variable model (Algorithm, model) pair

carefully co-designed for tractability

Custom topic models

15


Data



Latent variable model (Algorithm, model) pair

carefully co-designed for tractability

Evaluate, iterate

Custom topic models

16


Data



General-purpose modeling framework

Evaluate, iterate

Our contribution

• We introduce latent topic networks

– A versatile, general-purpose framework for specifying custom topic models

– Models and domain knowledge specified using a simple logical probabilistic programming language

– A highly parallelizable EM training algorithm

17

Our contribution





18

Our contribution





19

Z

𝚽 W

𝛉

LDA likelihood

Latent topic networks

Z

𝛉

𝚽

𝚽

𝚽 W

𝛉

𝛉

𝚽

Networks of dependencies between topics, distributions over topics

LDA likelihood


Z

𝛉

𝚽

𝚽

𝚽 W

X Observed covariates

𝛉

𝛉

𝚽

X


LDA likelihood


Z

𝛉

𝚽

𝚽

𝚽 W


Y

Y

Labeled data

𝛉

𝛉

𝚽

X


LDA likelihood


Z

𝛉

𝚽

𝚽

𝚽 W


Y

Y

Labeled data

𝛉

Z

𝛉

𝚽

X

Z

Latent variables


LDA likelihood


25

≈6 months Grad student Topic modeling research paper

= +

Previously…

26


= +

Previously…

27


= +

Previously…

28


= +

Previously…

29


= +

Previously…

30

≈6 months Grad student New custom topic model

= +


1 weekend

Shachi Kumar Master’s student, UCSC

Related work

31

Correlations / Dependencies

Observed Covariates

Additional Latent Variables

Constraints Probabilistic Programming

Systems for Encoding Domain Knowledge, Covariates, and Correlations

CTM (Blei and Lafferty, 2007)

DMR (Mimno & McCallum, 2008)

Dirichlet Forests (Andzejewski et al., 2009

xLDA (Wahabzada et al., 2010)

SAGE (Eisenstein et al., 2011)

STM (Roberts et al., 2013)

Graphical Modeling and Probabilistic Programming Systems

CTRF (Zhu & Xing, 2010)

Fold.all (Andrzejewski et al., 2011)

Logic LDA (Mei et al., 2014)

Latent Topic Networks

Related work

32

Correlations / Dependencies

Observed Covariates

Additional Latent Variables

Constraints Probabilistic Programming

Systems for Encoding Domain Knowledge, Covariates, and Correlations

CTM (Blei and Lafferty, 2007)

DMR (Mimno & McCallum, 2008)

Dirichlet Forests (Andzejewski et al., 2009

xLDA (Wahabzada et al., 2010)

SAGE (Eisenstein et al., 2011)

STM (Roberts et al., 2013)

Graphical Modeling and Probabilistic Programming Systems

CTRF (Zhu & Xing, 2010)

Fold.all (Andrzejewski et al., 2011)

Logic LDA (Mei et al., 2014)

Latent Topic Networks

Example: modeling influence in citation networks

33 Foulds and Smyth (2013), EMNLP

34

Which are the most important articles?


Foulds and Smyth (2013), EMNLP

35

What are the influence relationships between articles?



Topical influence regression

36 Foulds and Smyth (2013), EMNLP

Latent variables for document influence citation edge influence

Topical influence regression

37

Probabilistic dependencies along the citation graph


Latent variables for document influence citation edge influence

Encoding dependencies via logical rules

38

Restrict dependencies to citation graph

Influence and topic value are both high

Citing document also has the topic


39


Influence and topic value are both high



40


Influence and topic are both high



41





42




Entire model with just 5 rules!

Statistical relational learning

• An “interface layer for AI.”

– Programming languages for

specifying models and

encoding domain knowledge

– Typically based on first-order logic

43

Probabilistic soft logic (PSL)

• A first-order logic-based SRL language

• Used to specify hinge-loss MRFs, a class of highly scalable continuous graphical models

44

5.0:

Predicate Logical operators

Rule weight Continuous random variables!




45

5.0:






46

5.0:






47

5.0:






48

5.0:





• Specifies a class of highly scalable continuous graphical models called hinge-loss MRFs

49

5.0:



Hinge-loss MRFs

50

Conditional random field over continuous random variables between 0 and 1

Hinge-loss MRFs

51


Feature functions are hinge loss functions

Hinge-loss MRFs

52



Hinge-loss MRFs

53



Linear function

Hinge-loss MRFs

54



Linear function

Hinge-loss MRFs

55



Linear function

2

Hinge-loss MRFs

56



Hinge losses encode the distance to satisfaction for each instantiated rule

2

Linear function

Latent Dirichlet allocation

• Priors:

57

Latent Dirichlet allocation

• Priors:

58


• Priors: Hinge-loss MRFs

59

Log posterior objective function

60

LDA log posterior

Hinge loss terms


61

LDA log posterior

Hinge loss terms


62

LDA log posterior

Hinge loss terms


63

LDA log posterior

Hinge loss terms

Tractability from convexity, instead of conjugacy!


64

LDA log posterior

Hinge loss terms

Tractability from convexity, instead of conjugacy!

Training algorithm

• Expectation Maximization

– E-step: the same as for LDA

65

Training algorithm



– M-step:

66

LDA EM lower bound

Training algorithm



– M-step:

67

LDA EM lower bound minus hinge loss terms

Training algorithm



– M-step:

68 Convex optimization! Solve in parallel using consensus ADMM

LDA EM lower bound minus hinge loss terms

Weight learning

• Optimize pseudo-likelihood approximation:

• Gradient:

• Importance sample from the implied Dirichlet prior

69

Weight learning


• Gradient:


70

Weight learning


• Gradient:


71

Case study: Exploring influence in citation networks

72


73


74


75

Case study: Modeling US Presidential state of the Union addresses

• The US President updates Congress on the state of the Union, roughly annually

• Do these addresses depict the true, underlying state of the Union?

• Are they biased by political agendas?

76


77

State of the Union

Republican party bias

Democrat party bias

Topic model 𝛉

Address

Time (years)


78

State of the Union


Democrat party bias

Topic model 𝛉

Address

Time (years)


79

State of the Union


Democrat party bias

Topic model 𝛉

Address

Time (years)


80

State of the Union


Democrat party bias

Topic model 𝛉

Address

Time (years)


81

State of the Union


Democrat party bias

Topic model 𝛉

Address

Time (years)


82

Democrat topic

Republican topic


83

Democrat topic

Republican topic


84

Democrat topic

Republican topic


85


86

WW I

WW II


87

WW I

WW II

Vietnam


88

Document Completion Perplexity

Fully Held-Out Perplexity


2.33 x 103 2.43 x 103

LDA topic model 2.36 x 103

2.59 x 103

Dynamic topic model

2.43 x 103

2.55 x 103

Conclusion

• We introduce latent topic networks, a versatile general-purpose framework for building and inferring custom topic models.

• Our experimental results show that models specified in our framework with just a few lines of code in a logical language, are competitive with state of the art special purpose models.

• Future directions – Using our framework to answer substantive questions in social science.

– New language primitives, non-parametric Bayesian models,

algorithmic advances …

89

Conclusion


• Our experimental results show that models specified in our framework with just a few lines of code in a logical language, can be competitive with state of the art special purpose models.




90

Conclusion






91

Thanks to my collaborators at UC Santa Cruz

• Lise Getoor

• Shachi Kumar

92

Conclusion






93

Code: coming soon at psl.cs.umd.edu !

Conclusion






94 Thank you for your attention

Code: coming soon at psl.cs.umd.edu !

exploiting context in topic models - semantic scholar · framework for topic models ... dmr (mimno...

Documents