conditional topic random fields jun zhu and eric p. xing (@cs.cmu.edu) icml 2010 presentation and...
TRANSCRIPT
![Page 1: Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e695503460f94b66080/html5/thumbnails/1.jpg)
Conditional Topic Random Fields
Jun Zhu and Eric P. Xing(@cs.cmu.edu)
ICML 2010
Presentation and Discussion by Eric WangJanuary 12, 2011
![Page 2: Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e695503460f94b66080/html5/thumbnails/2.jpg)
Overview• Introduction – nontrivial input features for text.
• Conditional Random Fields
• CdTM and CTRF
• Model Inference
• Experimental Results
![Page 3: Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e695503460f94b66080/html5/thumbnails/3.jpg)
Introduction• Topic models such as LDA are not “feature-based” in their
inability to efficiently incorporate nontrivial features (contextual or summary features).
• Further, they assume a bag-of-words construction, discarding order information that may be important.
• The authors propose a model that addresses both feature and independence limitations using a conditional random field (CRF) than a fully generative model.
![Page 4: Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e695503460f94b66080/html5/thumbnails/4.jpg)
Conditional Random Fields• A conditional random field (CRF) is a way to label and
segment structured data that removes independence assumptions imposed by HMMs.
• The underlying idea of CRFs is that a sequence of random variables Y is globally conditioned on a sequence of observations X.
Image sourceHanna M. Wallach. Conditional Random Fields: An Introduction. Technical Report.. Department of Computer and Information Science, University of Pennsylvania, 2004.
![Page 5: Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e695503460f94b66080/html5/thumbnails/5.jpg)
Conditional Topic Model• Assume a set of features
denoting arbitrary local and global features.
• The topic weight vector is defined as
where f is a vector of feature functions defined on the features a and
![Page 6: Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e695503460f94b66080/html5/thumbnails/6.jpg)
Conditional Topic Model• The inclusion of Y is in following
sLDA where the topic model regresses to a continuous or discrete response.
• is the standard topic distributions over words.
• This model does not impose word order dependence.
![Page 7: Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e695503460f94b66080/html5/thumbnails/7.jpg)
Feature Functions• Consider, for example, the set of word features “positive
adjective”, “negative adjective”, “positive adjective with an inverting word”, “negative adjective with an inverting word”, so M=4.
• If the word is “good” will yield a feature function vector
while the word “not bad” will yield
• The features are then concatenated depending on the topic assignment of the word . Suppose = h, then the feature f for “good” is a length MK vector:
[ 1 0 0 0]’
[ 0 0 0 1]’
[ 0 0 0 0 | 0 0 0 0 |…| 1 0 0 0 |…| 0 0 0 0 | 0 0 0 0 ]’
k=1 k=hk=2 k=K-1 k=K
![Page 8: Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e695503460f94b66080/html5/thumbnails/8.jpg)
Conditional Topic Random Fields
• The generative process of CTRF for a single document is
![Page 9: Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e695503460f94b66080/html5/thumbnails/9.jpg)
Conditional Topic Random Fields• The term is a conditional topic random field over
the topic assignments of all the words in one sentence and has the form
• In the linear chain CTRF, the authors consider both singleton and pairwise feature functions
• The cumulative feature function value on a sentence is
• The pairwise feature function is assumed to be zero if
Singleton Pairwise
![Page 10: Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e695503460f94b66080/html5/thumbnails/10.jpg)
Model Inference• Inference is performed in a similar variational fashion as in
Correlated Topic Models (CRM).
• The authors introduce a relaxation of the lower bound due to the introduction of the CRF, although for the univariate CdTM, the variational posterior can be computed exactly.
• A close form solution is not available for , so an efficient gradient descent approach is used instead.
![Page 11: Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e695503460f94b66080/html5/thumbnails/11.jpg)
Empirical Results• The authors use hotel reviews built by crawling TripAdvisor.
• The dataset consists of 5000 reviews with lengths between 1500 and 6000 words. The dataset also includes an integer (1-5) rating for each review. Each rating was represented by 1000 documents.
• POS tags were employed to find adjectives.
• Noun phrase chunking was used to associate words with good or bad connotations. The authors also extracted whether an inverting word is with 4 words of each adjective.
• Lexicon size was 12000 when rare and stop words were removed.
![Page 12: Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e695503460f94b66080/html5/thumbnails/12.jpg)
Comparison of RatingPrediction Accuracy
Equation Source:Blei, D. & McAuliffe, J. Supervised topic models. NIPS, 2007.
![Page 13: Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e695503460f94b66080/html5/thumbnails/13.jpg)
Topics
![Page 14: Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e695503460f94b66080/html5/thumbnails/14.jpg)
Ratings and Topics
• Here, the authors show that supervised CTRF (sCTRF) shows good separation of rating scores among the topics (top row) compared to MedLDA (bottom row).
![Page 15: Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e695503460f94b66080/html5/thumbnails/15.jpg)
Feature Weights
• Five features were considered: Default–equal to one for any word; Pos-JJ–positive adjective; Neg-JJ–negative adjective; Re-Pos-JJ–positive adjective that has a denying word before it; and Re-Neg-JJ–negative adjective that has a denying word before it.
• The default feature dominates when truncated to 5 topics, but becomes less important at higher truncation levels.