exploiting context in topic models - semantic scholar · framework for topic models ... dmr (mimno...

94
Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models Jack Baskin School of Engineering University of California, Santa Cruz James Foulds Shachi Kumar Lise Getoor

Upload: buicong

Post on 12-Apr-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

Latent Topic Networks:

A Versatile Probabilistic Programming Framework for Topic Models

Jack Baskin School of Engineering University of California, Santa Cruz

James Foulds Shachi Kumar Lise Getoor

Probabilistic latent variable modeling

2

Data

Complicated, noisy, high-dimensional

Probabilistic latent variable modeling

3

Understand, explore, predict

Data

Complicated, noisy, high-dimensional

Probabilistic latent variable modeling

4

Understand, explore, predict

Data

Complicated, noisy, high-dimensional

Latent variable model

Probabilistic latent variable modeling

5

Understand, explore, predict

Data

Complicated, noisy, high-dimensional

Low-dimensional, semantically meaningful representations

Latent variable model

Topic models

• Topic models are foundational building blocks for powerful latent variable models

– Authorship (Rosen-Zvi et al., 2004)

– Conversational Influence (Nguyen et al., 2014)

– Knowledge base construction

(Movshovitz-Attias and Cohen, 2015)

– Machine translation (Mimno et al., 2009)

– Political analysis (Grimmer, 2010), (Gerrish and Blei, 2011, 2012)

– Recommender systems (Wang and Blei, 2011), (Diao et al., 2014)

– Scientific impact (Dietz et al. 2007), (Foulds and Smyth, 2013)

– Social network analysis (Chang et al., 2009)

– Word-sense disambiguation (Boyd-Graber et al., 2007)

– …

6

Custom topic models

• Custom latent variable topic models useful for

data mining and computational social science

• The challenge is scalability

7

Custom topic models

• Custom latent variable topic models useful for

data mining and computational social science

• The challenge is scalability

8

Custom topic models

• Custom latent variable topic models useful for

data mining and computational social science

• The challenge is scalability

9

Sparse, stochastic, collapsed, distributed algorithms, …

Custom topic models

• Custom latent variable topic models useful for

data mining and computational social science

• The challenge is scalability

10

Sparse, stochastic, collapsed, distributed algorithms, …

Max Welling

There’s no end to speeding up LDA!

Custom topic models

• Custom latent variable topic models useful for

data mining and computational social science

• The bottleneck is human effort and expertise

11

Design time >> run time

Custom topic models

12

Understand, explore, predict

Data

Complicated, noisy, high-dimensional

Low-dimensional, semantically meaningful representations

Latent variable model

Custom topic models

13

Understand, explore, predict

Data

Complicated, noisy, high-dimensional

Low-dimensional, semantically meaningful representations

Latent variable model

Custom topic models

14

Understand, explore, predict

Data

Complicated, noisy, high-dimensional

Low-dimensional, semantically meaningful representations

Latent variable model (Algorithm, model) pair

carefully co-designed for tractability

Custom topic models

15

Understand, explore, predict

Data

Complicated, noisy, high-dimensional

Low-dimensional, semantically meaningful representations

Latent variable model (Algorithm, model) pair

carefully co-designed for tractability

Evaluate, iterate

Custom topic models

16

Understand, explore, predict

Data

Complicated, noisy, high-dimensional

Low-dimensional, semantically meaningful representations

General-purpose modeling framework

Evaluate, iterate

Our contribution

• We introduce latent topic networks

– A versatile, general-purpose framework for specifying custom topic models

– Models and domain knowledge specified using a simple logical probabilistic programming language

– A highly parallelizable EM training algorithm

17

Our contribution

• We introduce latent topic networks

– A versatile, general-purpose framework for specifying custom topic models

– Models and domain knowledge specified using a simple logical probabilistic programming language

– A highly parallelizable EM training algorithm

18

Our contribution

• We introduce latent topic networks

– A versatile, general-purpose framework for specifying custom topic models

– Models and domain knowledge specified using a simple logical probabilistic programming language

– A highly parallelizable EM training algorithm

19

Z

𝚽 W

𝛉

LDA likelihood

Latent topic networks

Z

𝛉

𝚽

𝚽

𝚽 W

𝛉

𝛉

𝚽

Networks of dependencies between topics, distributions over topics

LDA likelihood

Latent topic networks

Z

𝛉

𝚽

𝚽

𝚽 W

X Observed covariates

𝛉

𝛉

𝚽

X

Networks of dependencies between topics, distributions over topics

LDA likelihood

Latent topic networks

Z

𝛉

𝚽

𝚽

𝚽 W

X Observed covariates

Y

Y

Labeled data

𝛉

𝛉

𝚽

X

Networks of dependencies between topics, distributions over topics

LDA likelihood

Latent topic networks

Z

𝛉

𝚽

𝚽

𝚽 W

X Observed covariates

Y

Y

Labeled data

𝛉

Z

𝛉

𝚽

X

Z

Latent variables

Networks of dependencies between topics, distributions over topics

LDA likelihood

Latent topic networks

25

≈6 months Grad student Topic modeling research paper

= +

Previously…

26

≈6 months Grad student Topic modeling research paper

= +

Previously…

27

≈6 months Grad student Topic modeling research paper

= +

Previously…

28

≈6 months Grad student Topic modeling research paper

= +

Previously…

29

≈6 months Grad student Topic modeling research paper

= +

Previously…

30

≈6 months Grad student New custom topic model

= +

Latent topic networks

1 weekend

Shachi Kumar Master’s student, UCSC

Related work

31

Correlations / Dependencies

Observed Covariates

Additional Latent Variables

Constraints Probabilistic Programming

Systems for Encoding Domain Knowledge, Covariates, and Correlations

CTM (Blei and Lafferty, 2007)

DMR (Mimno & McCallum, 2008)

Dirichlet Forests (Andzejewski et al., 2009

xLDA (Wahabzada et al., 2010)

SAGE (Eisenstein et al., 2011)

STM (Roberts et al., 2013)

Graphical Modeling and Probabilistic Programming Systems

CTRF (Zhu & Xing, 2010)

Fold.all (Andrzejewski et al., 2011)

Logic LDA (Mei et al., 2014)

Latent Topic Networks

Related work

32

Correlations / Dependencies

Observed Covariates

Additional Latent Variables

Constraints Probabilistic Programming

Systems for Encoding Domain Knowledge, Covariates, and Correlations

CTM (Blei and Lafferty, 2007)

DMR (Mimno & McCallum, 2008)

Dirichlet Forests (Andzejewski et al., 2009

xLDA (Wahabzada et al., 2010)

SAGE (Eisenstein et al., 2011)

STM (Roberts et al., 2013)

Graphical Modeling and Probabilistic Programming Systems

CTRF (Zhu & Xing, 2010)

Fold.all (Andrzejewski et al., 2011)

Logic LDA (Mei et al., 2014)

Latent Topic Networks

Example: modeling influence in citation networks

33 Foulds and Smyth (2013), EMNLP

34

Which are the most important articles?

Example: modeling influence in citation networks

Foulds and Smyth (2013), EMNLP

35

What are the influence relationships between articles?

Example: modeling influence in citation networks

Foulds and Smyth (2013), EMNLP

Topical influence regression

36 Foulds and Smyth (2013), EMNLP

Latent variables for document influence citation edge influence

Topical influence regression

37

Probabilistic dependencies along the citation graph

Foulds and Smyth (2013), EMNLP

Latent variables for document influence citation edge influence

Encoding dependencies via logical rules

38

Restrict dependencies to citation graph

Influence and topic value are both high

Citing document also has the topic

Encoding dependencies via logical rules

39

Restrict dependencies to citation graph

Influence and topic value are both high

Citing document also has the topic

Encoding dependencies via logical rules

40

Restrict dependencies to citation graph

Influence and topic are both high

Citing document also has the topic

Encoding dependencies via logical rules

41

Restrict dependencies to citation graph

Influence and topic are both high

Citing document also has the topic

Encoding dependencies via logical rules

42

Restrict dependencies to citation graph

Influence and topic are both high

Citing document also has the topic

Entire model with just 5 rules!

Statistical relational learning

• An “interface layer for AI.”

– Programming languages for

specifying models and

encoding domain knowledge

– Typically based on first-order logic

43

Probabilistic soft logic (PSL)

• A first-order logic-based SRL language

• Used to specify hinge-loss MRFs, a class of highly scalable continuous graphical models

44

5.0:

Predicate Logical operators

Rule weight Continuous random variables!

Probabilistic soft logic (PSL)

• A first-order logic-based SRL language

• Used to specify hinge-loss MRFs, a class of highly scalable continuous graphical models

45

5.0:

Predicate Logical operators

Rule weight Continuous random variables!

Probabilistic soft logic (PSL)

• A first-order logic-based SRL language

• Used to specify hinge-loss MRFs, a class of highly scalable continuous graphical models

46

5.0:

Predicate Logical operators

Rule weight Continuous random variables!

Probabilistic soft logic (PSL)

• A first-order logic-based SRL language

• Used to specify hinge-loss MRFs, a class of highly scalable continuous graphical models

47

5.0:

Predicate Logical operators

Rule weight Continuous random variables!

Probabilistic soft logic (PSL)

• A first-order logic-based SRL language

• Used to specify hinge-loss MRFs, a class of highly scalable continuous graphical models

48

5.0:

Predicate Logical operators

Rule weight Continuous random variables!

Probabilistic soft logic (PSL)

• A first-order logic-based SRL language

• Specifies a class of highly scalable continuous graphical models called hinge-loss MRFs

49

5.0:

Predicate Logical operators

Rule weight Continuous random variables!

Hinge-loss MRFs

50

Conditional random field over continuous random variables between 0 and 1

Hinge-loss MRFs

51

Conditional random field over continuous random variables between 0 and 1

Feature functions are hinge loss functions

Hinge-loss MRFs

52

Feature functions are hinge loss functions

Conditional random field over continuous random variables between 0 and 1

Hinge-loss MRFs

53

Feature functions are hinge loss functions

Conditional random field over continuous random variables between 0 and 1

Linear function

Hinge-loss MRFs

54

Feature functions are hinge loss functions

Conditional random field over continuous random variables between 0 and 1

Linear function

Hinge-loss MRFs

55

Feature functions are hinge loss functions

Conditional random field over continuous random variables between 0 and 1

Linear function

2

Hinge-loss MRFs

56

Feature functions are hinge loss functions

Conditional random field over continuous random variables between 0 and 1

Hinge losses encode the distance to satisfaction for each instantiated rule

2

Linear function

Latent Dirichlet allocation

• Priors:

57

Latent Dirichlet allocation

• Priors:

58

Latent topic networks

• Priors: Hinge-loss MRFs

59

Log posterior objective function

60

LDA log posterior

Hinge loss terms

Log posterior objective function

61

LDA log posterior

Hinge loss terms

Log posterior objective function

62

LDA log posterior

Hinge loss terms

Log posterior objective function

63

LDA log posterior

Hinge loss terms

Tractability from convexity, instead of conjugacy!

Log posterior objective function

64

LDA log posterior

Hinge loss terms

Tractability from convexity, instead of conjugacy!

Training algorithm

• Expectation Maximization

– E-step: the same as for LDA

65

Training algorithm

• Expectation Maximization

– E-step: the same as for LDA

– M-step:

66

LDA EM lower bound

Training algorithm

• Expectation Maximization

– E-step: the same as for LDA

– M-step:

67

LDA EM lower bound minus hinge loss terms

Training algorithm

• Expectation Maximization

– E-step: the same as for LDA

– M-step:

68 Convex optimization! Solve in parallel using consensus ADMM

LDA EM lower bound minus hinge loss terms

Weight learning

• Optimize pseudo-likelihood approximation:

• Gradient:

• Importance sample from the implied Dirichlet prior

69

Weight learning

• Optimize pseudo-likelihood approximation:

• Gradient:

• Importance sample from the implied Dirichlet prior

70

Weight learning

• Optimize pseudo-likelihood approximation:

• Gradient:

• Importance sample from the implied Dirichlet prior

71

Case study: Exploring influence in citation networks

72

Case study: Exploring influence in citation networks

73

Case study: Exploring influence in citation networks

74

Case study: Exploring influence in citation networks

75

Case study: Modeling US Presidential state of the Union addresses

• The US President updates Congress on the state of the Union, roughly annually

• Do these addresses depict the true, underlying state of the Union?

• Are they biased by political agendas?

76

Case study: Modeling US Presidential state of the Union addresses

77

State of the Union

Republican party bias

Democrat party bias

Topic model 𝛉

Address

Time (years)

Case study: Modeling US Presidential state of the Union addresses

78

State of the Union

Republican party bias

Democrat party bias

Topic model 𝛉

Address

Time (years)

Case study: Modeling US Presidential state of the Union addresses

79

State of the Union

Republican party bias

Democrat party bias

Topic model 𝛉

Address

Time (years)

Case study: Modeling US Presidential state of the Union addresses

80

State of the Union

Republican party bias

Democrat party bias

Topic model 𝛉

Address

Time (years)

Case study: Modeling US Presidential state of the Union addresses

81

State of the Union

Republican party bias

Democrat party bias

Topic model 𝛉

Address

Time (years)

Case study: Modeling US Presidential state of the Union addresses

82

Democrat topic

Republican topic

Case study: Modeling US Presidential state of the Union addresses

83

Democrat topic

Republican topic

Case study: Modeling US Presidential state of the Union addresses

84

Democrat topic

Republican topic

Case study: Modeling US Presidential state of the Union addresses

85

Case study: Modeling US Presidential state of the Union addresses

86

WW I

WW II

Case study: Modeling US Presidential state of the Union addresses

87

WW I

WW II

Vietnam

Case study: Modeling US Presidential state of the Union addresses

88

Document Completion Perplexity

Fully Held-Out Perplexity

Latent topic networks

2.33 x 103 2.43 x 103

LDA topic model 2.36 x 103

2.59 x 103

Dynamic topic model

2.43 x 103

2.55 x 103

Conclusion

• We introduce latent topic networks, a versatile general-purpose framework for building and inferring custom topic models.

• Our experimental results show that models specified in our framework with just a few lines of code in a logical language, are competitive with state of the art special purpose models.

• Future directions – Using our framework to answer substantive questions in social science.

– New language primitives, non-parametric Bayesian models,

algorithmic advances …

89

Conclusion

• We introduce latent topic networks, a versatile general-purpose framework for building and inferring custom topic models.

• Our experimental results show that models specified in our framework with just a few lines of code in a logical language, can be competitive with state of the art special purpose models.

• Future directions – Using our framework to answer substantive questions in social science.

– New language primitives, non-parametric Bayesian models,

algorithmic advances …

90

Conclusion

• We introduce latent topic networks, a versatile general-purpose framework for building and inferring custom topic models.

• Our experimental results show that models specified in our framework with just a few lines of code in a logical language, can be competitive with state of the art special purpose models.

• Future directions – Using our framework to answer substantive questions in social science.

– New language primitives, non-parametric Bayesian models,

algorithmic advances …

91

Thanks to my collaborators at UC Santa Cruz

• Lise Getoor

• Shachi Kumar

92

Conclusion

• We introduce latent topic networks, a versatile general-purpose framework for building and inferring custom topic models.

• Our experimental results show that models specified in our framework with just a few lines of code in a logical language, can be competitive with state of the art special purpose models.

• Future directions – Using our framework to answer substantive questions in social science.

– New language primitives, non-parametric Bayesian models,

algorithmic advances …

93

Code: coming soon at psl.cs.umd.edu !

Conclusion

• We introduce latent topic networks, a versatile general-purpose framework for building and inferring custom topic models.

• Our experimental results show that models specified in our framework with just a few lines of code in a logical language, can be competitive with state of the art special purpose models.

• Future directions – Using our framework to answer substantive questions in social science.

– New language primitives, non-parametric Bayesian models,

algorithmic advances …

94 Thank you for your attention

Code: coming soon at psl.cs.umd.edu !