conditional topic random fields jun zhu and eric p. xing (@cs.cmu.edu) icml 2010 presentation and...

Conditional Topic Random Fields

Jun Zhu and Eric P. Xing(@cs.cmu.edu)

ICML 2010

Presentation and Discussion by Eric WangJanuary 12, 2011

Overview• Introduction – nontrivial input features for text.

• Conditional Random Fields

• CdTM and CTRF

• Model Inference

• Experimental Results

Introduction• Topic models such as LDA are not “feature-based” in their

inability to efficiently incorporate nontrivial features (contextual or summary features).

• Further, they assume a bag-of-words construction, discarding order information that may be important.

• The authors propose a model that addresses both feature and independence limitations using a conditional random field (CRF) than a fully generative model.

Conditional Random Fields• A conditional random field (CRF) is a way to label and

segment structured data that removes independence assumptions imposed by HMMs.

• The underlying idea of CRFs is that a sequence of random variables Y is globally conditioned on a sequence of observations X.

Image sourceHanna M. Wallach. Conditional Random Fields: An Introduction. Technical Report.. Department of Computer and Information Science, University of Pennsylvania, 2004.

Conditional Topic Model• Assume a set of features

denoting arbitrary local and global features.

• The topic weight vector is defined as

where f is a vector of feature functions defined on the features a and

Conditional Topic Model• The inclusion of Y is in following

sLDA where the topic model regresses to a continuous or discrete response.

• is the standard topic distributions over words.

• This model does not impose word order dependence.

Feature Functions• Consider, for example, the set of word features “positive

adjective”, “negative adjective”, “positive adjective with an inverting word”, “negative adjective with an inverting word”, so M=4.

• If the word is “good” will yield a feature function vector

while the word “not bad” will yield

• The features are then concatenated depending on the topic assignment of the word . Suppose = h, then the feature f for “good” is a length MK vector:

[ 1 0 0 0]’

[ 0 0 0 1]’

[ 0 0 0 0 | 0 0 0 0 |…| 1 0 0 0 |…| 0 0 0 0 | 0 0 0 0 ]’

k=1 k=hk=2 k=K-1 k=K

Conditional Topic Random Fields

• The generative process of CTRF for a single document is

Conditional Topic Random Fields• The term is a conditional topic random field over

the topic assignments of all the words in one sentence and has the form

• In the linear chain CTRF, the authors consider both singleton and pairwise feature functions

• The cumulative feature function value on a sentence is

• The pairwise feature function is assumed to be zero if

Singleton Pairwise

Model Inference• Inference is performed in a similar variational fashion as in

Correlated Topic Models (CRM).

• The authors introduce a relaxation of the lower bound due to the introduction of the CRF, although for the univariate CdTM, the variational posterior can be computed exactly.

• A close form solution is not available for , so an efficient gradient descent approach is used instead.

Empirical Results• The authors use hotel reviews built by crawling TripAdvisor.

• The dataset consists of 5000 reviews with lengths between 1500 and 6000 words. The dataset also includes an integer (1-5) rating for each review. Each rating was represented by 1000 documents.

• POS tags were employed to find adjectives.

• Noun phrase chunking was used to associate words with good or bad connotations. The authors also extracted whether an inverting word is with 4 words of each adjective.

• Lexicon size was 12000 when rare and stop words were removed.

Comparison of RatingPrediction Accuracy

Equation Source:Blei, D. & McAuliffe, J. Supervised topic models. NIPS, 2007.

Topics

Ratings and Topics

• Here, the authors show that supervised CTRF (sCTRF) shows good separation of rating scores among the topics (top row) compared to MedLDA (bottom row).

Feature Weights

• Five features were considered: Default–equal to one for any word; Pos-JJ–positive adjective; Neg-JJ–negative adjective; Re-Pos-JJ–positive adjective that has a denying word before it; and Re-Neg-JJ–negative adjective that has a denying word before it.

• The default feature dominates when truncated to 5 topics, but becomes less important at higher truncation levels.

conditional topic random fields jun zhu and eric p. xing (@cs.cmu.edu) icml 2010 presentation and...

Documents

conditional topic modelassume

conditional random fieldscdtm

topic assignments

topic model regresses

topic weight vector

feature function vector

feature functionsconsider

standard topic distributions