![Page 1: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/1.jpg)
Unsupervised Approaches
Aditya M Joshi Center for Indian Language Technologies (CFILT)
IIT Bombay 20th June, 2016
![Page 2: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/2.jpg)
Images from wikimedia commons
![Page 3: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/3.jpg)
![Page 4: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/4.jpg)
Unsupervised Approaches
• Technique to infer a function to describe hidden structure from unlabelled data
• Use unlabelled data for prediction tasks
![Page 5: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/5.jpg)
Popular Approaches
• Clustering
• Latent Dirichlet Allocation (LDA) Model
![Page 6: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/6.jpg)
clustering
![Page 7: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/7.jpg)
Clustering
• Find clusters in a set of data points
![Page 8: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/8.jpg)
Clustering
• Find clusters in a set of data points
![Page 9: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/9.jpg)
k-means Clustering
• Dataset {x1, x2 … xn} • Goal: Partition n observations into k
clusters • Membership indicated by rnk
• Goal, redefined: Minimize J
![Page 10: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/10.jpg)
Algorithm
• Initialisation: pick K of the data points to be the initial means.
• Go over each data point and assign it to one of the means based on which is closest. E.g. if datapoint xn is closest to the second mean, assign it to that mean
• Recompute each of the means as the average of all the points assigned to the mean
![Page 11: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/11.jpg)
Illustration
![Page 12: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/12.jpg)
latent dirichlet allocation models
![Page 13: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/13.jpg)
Outline
• Motivation and Introduction (Blei (2011))
• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))
• Estimation using LDA (Heinrich (2004))
• Evaluation of LDA (Wallach (2009))
• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))
![Page 14: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/14.jpg)
Outline
• Motivation and Introduction (Blei (2011)) • Building blocks of LDA: Dirichlet and
Multinomials (Kullis (2012))
• Estimation using LDA (Heinrich (2004))
• Evaluation of LDA (Wallach (2009))
• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))
• Experimentation
![Page 15: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/15.jpg)
Revisiting classifiersWhat did Prof. Pushpak Bhattacharyya talk about in
“Topics in NLP” lecture today?
Lecture transcript
Classifier
NLP, Databases, Compilers
Topic models can do much more than this: with unlabeled corpus
SA, MT, Wordnet
![Page 16: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/16.jpg)
“Topic-document distribution”
Lectures from 2008 to 2013
Topic Modeler
strong AIparser
alignmentACL
thwarting
co-reference resolution
demoRPC
MTP
KrishnaRaag
Mahabharat
NLP Academic Cultural
Swar-sandhya
*Hypothetical example
NLP = 0.7 Academic = 0.2 Cultural = 0.1
Proportion of each topic in a document: “Multiple membership”
* And in context of sentiment analysis?
![Page 17: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/17.jpg)
“Word-topic distribution”“Aaditya, you are not making
sense.” “Let’s study word sense
disambiguation”
Lectures from 2008 to 2013
Topic Modeler
senselogic
explanationconfused
sense
wordnetpolysemy
iterative word
*Hypothetical example
“Relevance of each word to a topic” Words across “topics” actually indicate different senses in which a word occurs
![Page 18: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/18.jpg)
Definition
• Topic models are a suite of algorithms that discover thematic structures in a data collection. (Blei (2011) )
• What is a thematic structure? A topic: a collection of words
• Used for a wide variety of tasks such as: author recognition, aspect extraction, sentiment modelling
![Page 19: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/19.jpg)
Black box
Topic Modeler
Unlabeled corpus
Document-topic distribution
Overall word-topic distribution
FAQs:
Can you predict a test document directly? Not directly. Is there only one way to construct a topic model? No. By intelligent structure of the model, you can derive useful information.
![Page 20: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/20.jpg)
LDA Model
• Latent Dirichlet Allocation (LDA) model is a basic probabilistic topic model
• This presentation focuses on LDA and its adaptations with sentiment as the goal.
![Page 21: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/21.jpg)
Plate Notation (1/2)
wNd
D
wNd
D
Unlabeled Corpus
wNd
D
Labeled Corpus
L
z
Word
Topic (Latent)
![Page 22: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/22.jpg)
Plate Notation (2/2)
wNd
D
z
wNd
D
z
wNd
D
z
Ns
Word-level topics Document-level topicsSentence-level topics
![Page 23: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/23.jpg)
Growing LDA further
wNd
D
z
θ θ(Z): NLP = 0.7, culture = 0.2, motivation = 0.1
Z
ϕ
ϕ (Z,word): (NLP, sense) = 0.7, (culture, sense)= 0.1, (motivation, sense) = 0.2
Let us now focus on these two multinomial distributions
![Page 24: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/24.jpg)
Outline
• Motivation and Introduction (Blei (2011))
• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))
• Estimation using LDA (Heinrich (2004))
• Evaluation of LDA (Wallach (2009))
• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))
• Experimentation
![Page 25: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/25.jpg)
Multinomial distribution
• Training a LDA implies learning the parameters of the multinomial distribution
• We now focus on a multinomial distribution and the way it is modelled in case of LDA.
θ
![Page 26: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/26.jpg)
Parameter estimation (Heinrich (2004))
P(θ|x) = P(x | θ) P(θ) P(x)
Prior
Posterior
Likelihood
P(x) = ∫ P (x| θ). P(θ) d θ
Estimating the posterior: P(θ|x) Why? Goal: To estimate θd and ϕwz as accurately, given the data: documents The two are categorical distributions
P(θ|x) ∝ P(x | θ) P(θ)
Marginal likelihood
![Page 27: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/27.jpg)
Binomial distribution & MLE
• Toss of a biased coin
P(X=1) = q P(X=0) = (1-q)
X = {x1, x2,... ,xN}
MLE = argmax P(X|q)
MLE = argmax P(x1|q). P(x2|q)... P(XN|q) = argmax qx1(1-q)(1-x
1). .... qxN(1-q)
(1-xN
)
= argmax q(x1+x2..XN)(1-q)(N-(X1+X2..XN)
= argmax qm(1-q)n-m
= m / n
P(x1|q) = qx1(1-q)(1-x1)
argmax (mlog q + (n-m)log(1-q))
Equating derivative to zero, m/q = (n-m)/(1-q)
q = m/n m, n are “sufficient statistics” of a binominal distribution
![Page 28: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/28.jpg)
MAP of Binomial distribution (1/2)
MAP = argmax P(q|X) = argmax P(X|q) P(q) = argmax qm(1-q)n-m P(q) Problem! P(q) can be any distribution,
strictly speaking. Computationally difficult!
Assume: P(q) is a beta distribution P(q) = qα-1 (1-q) β-1 / Beta(β-α)
![Page 29: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/29.jpg)
MAP of Binomial distribution (2/2)
MAP = argmax qm(1-q)n-m P(q) ∝ argmax qm(1-q)n-m qα-1 (1-q) β-1 ∝ argmax q(m+α-1)(1-q)(n-m+β-1)
∝ (m+α-1) / (n+α+β-2)argmax (m+α-1) log q + (n-m+β-1) log(1-q))
Equating derivative to zero, (m+α-1) /q = (n-m+β-1) /(1-q) (m+α-1)-q(m+α-1) = q (n-m+β-1) (m+α-1) = q(m+α-1+n-m+ β-1) (m+α-1) = q(α+n+ β-2) q = (m+α-1) /(α+n+ β-2)
Beta Distribution
Binomial Distribution
Conjugate prior
![Page 30: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/30.jpg)
Conjugate prior
• A distribution is a conjugate prior to a posterior distribution if both of them have the same form
• “Algebraic convenience”
Beta distribution is a conjugate prior of binomial distribution
What is it for categorical distribution?
![Page 31: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/31.jpg)
Categorical distribution
• Roll of a diceP(X=1) = q1
P(X=2) = q2
... P(X=6) = q6
P(xi=k|q) = qk
X = {X1,...XN} ~ Cat(q)P(X|q) = argmax P(x1|q). P(x2|q)... P(XN|q)
= argmax π qjcj
MAP = argmax P(X|q) P(q)
= argmax π qjcj P(q)
P(q) ∝ π qj αj-1
MAP ∝ argmax π qjcj qj αj-1
∝ argmax π qj αj+cj-1
Dirichlet distributionDirichlet Distribution
Categorical Distribution
Conjugate prior
![Page 32: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/32.jpg)
Binomial & Categorical distribution
θ zCategorical
αθ ~ Dir (α) z ~ Categorical (θ)
q xBinomial
α q ~ Beta(α, β ) x ~ Binomial (q)
β
P(z| θ) = θz
Hyper-parameters Distribution Random variable assignments
Does the name Latent Dirichlet Allocation seem justifiable now?
![Page 33: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/33.jpg)
Z
Nd
D
Our first LDA model
θ
z
w
α
ϕ
β
![Page 34: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/34.jpg)
Outline
• Motivation and Introduction (Blei (2011))
• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))
• Estimation using LDA (Heinrich (2004)) • Evaluation of LDA (Wallach (2009))
• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))
• Experimentation
![Page 35: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/35.jpg)
Estimation of LDA model
The denominator is computationally intractable. Hence, Gibbs sampling is used.
We now describe the generative story.
P(θ, ϕ| w) = P(w| θ, ϕ)P(θ, ϕ )/P(w)
Every LDA paper has: Plate notation Generative story Gibbs sampling formulas
![Page 36: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/36.jpg)
Z
Nd
D
Generative story
θ
z
w
α
ϕ
β
Sample ϕ ~ Dir(β)
For each document, Generate θ ~ Dir (α) For each word,
Sample z ~ Multinomial (θ)
Sample w ~ ϕ(z)
![Page 37: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/37.jpg)
Z
Nd
D
Implementing topic models
θ
z
w
α
ϕ
β
Sample ϕ ~ Dir(β)
For each document, Generate θ ~ Dir (α) For each word,
Sample z ~ Multinomial (θ)
Sample w ~ ϕ(z)
![Page 38: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/38.jpg)
Sampling from multinomial
Input: θ : P(z=0) = 0.1, P(z=1) = 0.3, P(z=2) = 0.6
Goal: Sample a z given this distribution
θ
z
Z=0 Z=1 Z=2
0 0.1 0.4 1
![Page 39: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/39.jpg)
Z
Nd
D
Implementing topic models
θ
z
w
α
ϕ
β
Sample ϕ ~ Dir(β)
For each document, Generate θ ~ Dir (α) For each word,
Sample z ~ Multinomial (θ)
Sample w ~ ϕ(z)
![Page 40: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/40.jpg)
Gibbs sampling
Initialize all word positions to random z’s. Compute θ & ϕ accordingly. For each iteration, For each document, For each word, Generate a z based on θ Generate a w based on ϕw|z
Compute θ & ϕ
![Page 41: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/41.jpg)
Outline
• Motivation and Introduction (Blei (2011))
• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))
• Estimation using LDA (Heinrich (2004))
• Evaluation of LDA (Wallach (2009)) • Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))
• Experimentation
![Page 42: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/42.jpg)
Evaluation
• Qualitative evaluation (Understanding topic cohesion) (Mukherjee et al (2012))
• Classification accuracy based on topics uncovered
• Held-out likelihood (Likelihood of data given parameters) (Wallach et al (2009))
A naïve addition: • Measuring sentiment cohesion: Count of positive
and negative words in each topic
![Page 43: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/43.jpg)
Outline
• Motivation and Introduction (Blei (2011))
• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))
• Estimation using LDA (Heinrich (2004))
• Evaluation of LDA (Wallach (2009))
• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))
• Experimentation
![Page 44: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/44.jpg)
Experiments with LDA
• Goal: Understand topic models & obtain sentiment-coherent topics from a LDA model
• Implementation: – Topic model implementation using Gibbs
sampling – Hyper-parameter estimation as given in Heinrich
(2009) – “Left to right” likelihood algorithm by Wallach
(2009)
![Page 45: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/45.jpg)
Data set
• Movie review data set from Amazon by McAuley & Leskovec (2013). – Training data set: 11000 movie reviews – Test data set: 2000 movie reviews
• Average length of a review: ~140 words
![Page 46: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/46.jpg)
Effect of hyper-parameter estimation
![Page 47: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/47.jpg)
Discovering sentiment-coherent topics
Modify basic LDA in one of the following ways:
1) Bootstrap sentiment priors with word lists
2) Modifying the structure of the topic model
![Page 48: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/48.jpg)
Existing topic models
• Lin & He(2009) present a Joint Sentiment-Topic Model with sentiment as a latent variable.
• Jo & Ho(2011) extract senti-aspects: (sentiment, feature) pairs.
• Titov & McDonald (2008) use a sliding window model to incorporate discourse nature of reviews.
• Mukherjee & Liu (2012b) identify words belonging to six types of review comment expressions from an unlabeled corpus.
![Page 49: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/49.jpg)
Discovering sentiment-coherent topics
Modify basic LDA in one of the following ways:
1) Bootstrap sentiment priors with word lists
2) Modifying the structure of the topic model
![Page 50: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/50.jpg)
Discovering sentiment: Use of priors
• Induce positive words and negative words to belong to certain topics (based on Lin & He (2009) ) – For negative words, set
beta(word, z=0 to z/2) = 2*beta. beta(word, z=z/2 to z) = 0.
– The corresponding beta for positive words.
![Page 51: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/51.jpg)
Use of Priors: Results (1/2)
• “Basic”: Imposing priors on only 12 sentiment words
• Leads to greater sentiment words being identified in correct topics
![Page 52: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/52.jpg)
Use of Priors: Results(2/2)
Qualitative evaluation:Topic 38 7.330 horror 2.392 killer 2.248 scary 2.147 house 2.072 gore Some topics are positive while
others are negative, depending on the priors.Topic 13
6.931 michael 3.929 fans 3.423 live 2.379 amazing 2.354 concert
![Page 53: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/53.jpg)
Discovering sentiment-coherent topics
Modify basic LDA in one of the following ways:
1) Bootstrap sentiment priors with word lists
2) Modifying the structure of the topic model
![Page 54: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/54.jpg)
Discovering sentiment: Modifying structure
• Sentiment is explicitly modelled as a latent variable (Based on joint sentiment Tying model by Lin & He (2009)SLDA SLDA-Split
![Page 55: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/55.jpg)
Sentiment as a Variable: Results (1/2)
Parameters: Z = 70; S = 2
![Page 56: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/56.jpg)
Sentiment as a Variable: Results (2/2)
• SLDA
• SLDA-Split
Topic 13, s = 0 9.551 show 9.254 humor 7.166 comedy 4.846 watch 4.680 hilarious
Topic 13, s = 2 6.964 rock 5.547 children 5.38 school 4.636 remember 3.432 learn
No equivalence between topic 13 for s = 0 and s = 2.
Topic 31, s = 0 8.006 product 6.277 received 5.244 amazon 4.119 condition 4.043 seller
Topic 31, s = 1 5.206 return 4.661 problem 4.412 disappoint 3.654 case 3.616 copy
Topic 31, s = 2 10.358 amazon 9.213 play 7.068 player 3.651 dvds 3.594 purchased
A topic essentially implies “different polarities” in the same ‘context’
For S = 3,
![Page 57: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/57.jpg)
Conclusion
• Unsupervised approaches rely on unlabelled data
• We looked at k-means clustering • Also at unsupervised/semi-supervised
approaches like LDA
![Page 58: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/58.jpg)
References (1/2)• Balamurali, A., Joshi, A., & Bhattacharyya, P. (2011). Harnessing wordnet senses for supervised sentiment classication. In Proceedings of
the Conference on Empirical Methods in Natural Language Processing, (pp. 1081{1091). Association for Computational Linguistics. • Balamurali, A., Joshi, A., & Bhattacharyya, P. (2012). Cross-lingual sentiment analysis for indian languages using linked wordnets. In
COLING (Posters), (pp. 73{82). • Balamurali, A., Khapra, M. M., & Bhattacharyya, P. (2013). Lost in translation: viability of machine translation for cross language
sentiment analysis. In Computational Linguistics and Intelligent Text Processing (pp. 38{49). Springer. • Banea, C., Mihalcea, R., Wiebe, J., & Hassan, S. (2008). Multilingual subjectivity analysis using machine translation. In Proceedings of
the Conference on Empirical Methods in Natural Language Processing, (pp. 127{135). Association for Computational Linguistics. • Blei, D. M. (2011). Introduction to probabilistic topic models. • Blei, D. M., Ng, A. Y., Jordan, M. I., & La
erty, J. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 2003. • Boyd-Graber, J., Chang, J., Gerrish, S., Wang, C., & Blei, D. (2009). Reading tea leaves: How humans interpret topic models. In Neural
Information Processing Systems (NIPS). • Brody, S. & Elhadad, N. (2010). An unsupervised aspect-sentiment model for online reviews. In HLT-NAACL, (pp. 804{812). The
Association for Computational Linguistics. • Carl, M. (2012). Translog-ii: a program for recording user activity data for empirical reading and writing research. In LREC, (pp.
4108{4112). • Dragsted, B. (2010). Coordination of reading and writing processes in translation. Translation and Cognition, American Translators
Association Scholarly Monograph Series. Amsterdam/Philadelphia: Benjamins, 41{62. • Duh, K., Fujino, A., & Nagata, M. (2011). Is machine translation ripe for cross-lingual sentiment classication? In ACL (Short Papers), (pp.
429{433). • Fellbaum, C. (2010). Wordnet: An electronic lexical database. 1998. WordNet is available from http://www. cogsci. princeton. edu/wn. • Jo, Y. & Oh, A. (2011). Aspect and sentiment unication model for online review analysis. In Proceedings of the fourth ACM international
conference on Web search and data • mining, (pp. 815{824). ACM.
![Page 59: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu](https://reader034.vdocuments.site/reader034/viewer/2022050503/5f951f2d7126ba486d230c6c/html5/thumbnails/59.jpg)
References (2/2)• Joshi, S., Kanojia, D., & Bhattacharyya, P. (2013). More than meets the eye: Study of human cognition in sense annotation. In Proceedings of NAACL-HLT, (pp. 733{738). • Kulis, B. (2012). Conjugate priors. • Lin, C. & He, Y. (2009). Joint sentiment/topic model for sentiment analysis. In Cheung, D. W.-L., Song, I.-Y., Chu, W. W., Hu, X., & Lin, J. J. (Eds.), CIKM, (pp. 375{384).
ACM. • Lu, B., Tan, C., Cardie, C., & Tsou, B. K. Joint bilingual sentiment classification with unlabeled parallel corpora. • McAuley, J. J. & Leskovec, J. (2013). From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In Proceedings of the 22nd
international conference on World Wide Web, (pp. 897{908). International World Wide Web Conferences Steering Committee. • McCallum, A. (2002). MALLET: A machine learning for language toolkit. • Meng, X., Wei, F., Liu, X., Zhou, M., Xu, G., & Wang, H. (2012). Cross-lingual mixture model for sentiment classication. In Proceedings of the 50th Annual Meeting of the
Association for Computational Linguistics: Long Papers-Volume 1, (pp. 572{581). • Association for Computational Linguistics. • Mukherjee, A. & Liu, B. (2012a). Aspect extraction through semi-supervised modeling. In ACL (1), (pp. 339{348). The Association for Computer Linguistics. • Mukherjee, A. & Liu, B. (2012b). Modeling review comments. In ACL (1), (pp. 320{329). The Association for Computer Linguistics. • Mukherjee, A. & Liu, B. (2013). Discovering user interactions in ideological discussions. In ACL (1), (pp. 671{681). The Association for Computer Linguistics. • Mukherjee, S. & Bhattacharyya, P. (2012). Wikisent: Weakly supervised sentiment analysis through extractive summarization with wikipedia. In Machine Learning and
Knowledge Discovery in Databases (pp. 774{793). Springer. • Nallapati, R., Ahmed, A., Xing, E. P., & Cohen, W. W. (2008). Joint latent topic models for text and citations. In Li, Y., 0001, B. L., & Sarawagi, S. (Eds.), KDD, (pp.
542{550). ACM. • Pang, B. & Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual meeting
on Association for Computational Linguistics, (pp. 271). Association for Computational Linguistics. • Pang, B. & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2 (1-2), 1{135. • Prettenhofer, P. & Stein, B. (2010). Cross-language text classication using structural correspondence learning. In Proceedings of the 48th Annual Meeting of the Association
for Computational Linguistics, (pp. 1118{1127). Association for Computational Linguistics. • Rosen-Zvi, M., Griths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In 20th Conference on Uncertainty in Articial Intelligence,
volume 21, Ban Park Lodge, Ban , Canada.
• Scott, G. G., O'Donnell, P. J., & Sereno, S. C. (2012). Emotion words aect eye xations during reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38 (3), 783.
• Searle, J. R. (1992). The rediscovery of the mind. the MIT Press. • Titov, I. & McDonald, R. T. (2008a). A joint model of text and aspect ratings for sentiment summarization. In McKeown, K., Moore, J. D., Teufel, S., Allan, J., & Furui, S.
(Eds.), ACL, (pp. 308{316). The Association for Computer Linguistics. • Titov, I. & McDonald, R. T. (2008b). Modeling online reviews with multi-grain topic models. CoRR, abs/0801.1063. • Wallach, H. M., Mimno, D. M., & McCallum, A. (2009). Rethinking lda: Why priors matter. In NIPS, volume 22, (pp. 1973{1981). • Wallach, H. M., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation methods for topic models. In Proceedings of the 26th Annual International Conference on
Machine Learning, (pp. 1105{1112). ACM. • Wang, X., McCallum, A., & Wei, X. (2007). Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In Proceedings of the 7th IEEE • International Conference on Data Mining (ICDM), Nebraska, USA. • Yin, Y., Zhou, C., & Zhu, J. (2010). A pipe route design methodology by imitating human imaginal thinking. CIRP Annals-Manufacturing Technology, 59 (1), 167{170.