analysis of social media mld 10-802, lti 11-772
DESCRIPTION
Analysis of Social Media MLD 10-802, LTI 11-772. William Cohen 10-09-010. Stochastic blockmodel graphs. Last week: spectral clustering Theory suggests it will work for graphs produced by a particular generative model - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/1.jpg)
Analysis of Social MediaMLD 10-802, LTI 11-772
William Cohen10-09-010
![Page 2: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/2.jpg)
Stochastic blockmodel graphs• Last week: spectral clustering• Theory suggests it will work for graphs
produced by a particular generative model• Question: can you directly maximize
Pr(structure,parameters|data) for that model?
![Page 3: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/3.jpg)
Outline
• Stochastic block models & inference question• Review of text models
– Mixture of multinomials & EM– LDA and Gibbs (or variational EM)
• Block models and inference• Mixed-membership block models• Multinomial block models and inference w/ Gibbs• Beastiary of other probabilistic graph models
– Latent-space models, exchangeable graphs, p1, ERGM
![Page 4: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/4.jpg)
Review – supervised Naïve Bayes• Naïve Bayes Model: Compact representation
C
W1 W2 W3 ….. WN
C
W
N
M
M
b
b
![Page 5: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/5.jpg)
Review – supervised Naïve Bayes
• Multinomial Naïve Bayes
C
W1 W2 W3 ….. WN
M
b
• For each document d = 1,, M
• Generate Cd ~ Mult( ¢ | )
• For each position n = 1,, Nd
• Generate wn ~ Mult(¢|b,Cd)
![Page 6: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/6.jpg)
Review – supervised Naïve Bayes
• Multinomial naïve Bayes: Learning– Maximize the log-likelihood of observed variables
w.r.t. the parameters:
• Convex function: global optimum• Solution:
![Page 7: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/7.jpg)
Review – unsupervised Naïve Bayes
• Mixture model: unsupervised naïve Bayes model
C
W
NM
b
• Joint probability of words and classes:
• But classes are not visible:Z
![Page 8: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/8.jpg)
Review – unsupervised Naïve Bayes
• Mixture model: learning
– Not a convex function• No global optimum solution
– Solution: Expectation Maximization• Iterative algorithm• Finds local optimum• Guaranteed to maximize a lower-bound on the log-likelihood of
the observed data
![Page 9: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/9.jpg)
Review – unsupervised Naïve Bayes
• Mixture model: EM solution
E-step:
M-step:Key capability: estimate distribution of latent variables given observed variables
![Page 10: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/10.jpg)
Review - LDA
![Page 11: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/11.jpg)
Review - LDA
• Motivation
w
M
N
Assumptions: 1) documents are i.i.d 2) within a document, words are i.i.d. (bag of words)• For each document d = 1,,M
• Generate d ~ D1(…)
• For each word n = 1,, Nd
• generate wn ~ D2( ¢ | θdn)
Now pick your favorite distributions for D1, D2
![Page 12: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/12.jpg)
• Latent Dirichlet Allocation
z
w
b
M
N
a• For each document d = 1,,M
• Generate d ~ Dir(¢ | a)
• For each position n = 1,, Nd
• generate zn ~ Mult( ¢ | d)
• generate wn ~ Mult( ¢ | bzn)
“Mixed membership”kk
jjk nn
nnnnjz
aaa
a
...),...,,|Pr(
11,21
K
![Page 13: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/13.jpg)
• LDA’s view of a document
![Page 14: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/14.jpg)
• LDA topics
![Page 15: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/15.jpg)
Review - LDA
• Latent Dirichlet Allocation– Parameter learning:
• Variational EM– Numerical approximation using lower-bounds– Results in biased solutions– Convergence has numerical guarantees
• Gibbs Sampling – Stochastic simulation– unbiased solutions– Stochastic convergence
![Page 16: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/16.jpg)
Review - LDA• Gibbs sampling
– Applicable when joint distribution is hard to evaluate but conditional distribution is known
– Sequence of samples comprises a Markov Chain– Stationary distribution of the chain is the joint distribution
Key capability: estimate distribution of one latent variables given the other latent variables and observed variables.
![Page 17: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/17.jpg)
Why does Gibbs sampling work?
• What’s the fixed point?– Stationary distribution of the chain is the joint
distribution• When will it converge (in the limit)?
– Graph defined by the chain is connected• How long will it take to converge?
– Depends on second eigenvector of that graph
![Page 18: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/18.jpg)
![Page 19: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/19.jpg)
Called “collapsed Gibbs sampling” since you’ve marginalized away some variables
Fr: Parameter estimation for text analysis - Gregor Heinrich
![Page 20: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/20.jpg)
Review - LDA
• Latent Dirichlet Allocation
z
w
b
M
N
a • Randomly initialize each zm,n
• Repeat for t=1,….• For each doc m, word n
• Find Pr(zmn=k|other z’s)
• Sample zmn according to that distr.
“Mixed membership”
![Page 21: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/21.jpg)
Outline
• Stochastic block models & inference question• Review of text models
– Mixture of multinomials & EM– LDA and Gibbs (or variational EM)
• Block models and inference• Mixed-membership block models• Multinomial block models and inference w/ Gibbs• Beastiary of other probabilistic graph models
– Latent-space models, exchangeable graphs, p1, ERGM
![Page 22: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/22.jpg)
Statistical Models of Networks
• Want a generative probabilistic model that’s amenable to analysis….
• … but more expressive than Erdos-Renyi• One approach: exchangeable graph model
– Exchangeable: X1,X2 are exchangable if Pr(X1,X2,W)=Pr(X2,X1,W).
– The generalizes of i.i.d.-ness – It’s a Bayesian thing
![Page 23: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/23.jpg)
Review - LDA
• Motivation
w
M
N
Assumptions: 1) documents are i.i.d 2) within a document, words are i.i.d. (bag of words)• For each document d = 1,,M
• Generate d ~ D1(…)
• For each word n = 1,, Nd
• generate wn ~ D2( ¢ | θdn)
Docs and words are exchangeable.
![Page 24: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/24.jpg)
Stochastic Block models: assume 1) nodes w/in a block z and
2) edges between blocks zp,zq are exchangeable
zp zq
apq
N2
zp
N
a
p
b
![Page 25: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/25.jpg)
Stochastic Block models: assume 1) nodes w/in a block z and
2) edges between blocks zp,zq are exchangeable
zp zq
apq
N2
zp
N
a
p
b Gibbs sampling:
• Randomly initialize zp for each node p.
• For t = 1…• For each node p
• Compute zp given other z’s
• Sample zp
See: Snijders & Nowicki, 1997, Estimation and Prediction for Stochastic Blockmodels for Groups with Latent Graph Structure
![Page 26: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/26.jpg)
Mixed Membership Stochastic Block models
p q
zp. z.q
apq
N2
p
N
a
p
b
Airoldi et al, JMLR 2008
![Page 27: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/27.jpg)
Mixed Membership Stochastic Block models
![Page 28: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/28.jpg)
Mixed Membership Stochastic Block models
![Page 29: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/29.jpg)
Parkkinen et al paper
![Page 30: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/30.jpg)
Another mixed membership block model
![Page 31: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/31.jpg)
Another mixed membership block model
z=(zi,zj) is a pair of block ids
nz = #pairs z
qz1,i = #links to i from block z1
qz1,. = #outlinks in block z1
δ = indicator for diagonal
M = #nodes
![Page 32: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/32.jpg)
Another mixed membership block model
![Page 33: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/33.jpg)
Another mixed membership block model
![Page 34: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/34.jpg)
Outline
• Stochastic block models & inference question• Review of text models
– Mixture of multinomials & EM– LDA and Gibbs (or variational EM)
• Block models and inference• Mixed-membership block models• Multinomial block models and inference w/ Gibbs• Beastiary of other probabilistic graph models
– Latent-space models, exchangeable graphs, p1, ERGM
![Page 35: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/35.jpg)
Exchangeable Graph Model
• Defined by a 2k x 2k table q(b1,b2)• Draw a length-k bit string b(n) like 01101 for
each node n from a uniform distribution.• For each pair of node n,m
– Flip a coin with bias q(b(n),b(m))– If it’s heads connect n,m
complicated• Pick k-dimensional vector u from a
multivariate normal w/ variance α and covariance β – so ui’s are correlated.
• Pass each ui thru a sigmoid so it’s in [0,1] – call that pi
• Pick bi using pi
![Page 36: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/36.jpg)
Exchangeable Graph Model
• Pick k-dimensional vector u from a multivariate normal w/ variance α and covariance β – so ui’s are correlated.
• Pass each ui thru a sigmoid so it’s in [0,1] – call that pi
• Pick bi using pi
If α is big then ux,uy are really big (or small) so px,py will end up in a corner.
0 1
1
![Page 37: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/37.jpg)
The p1 model for a directed graph• Parameters, per node i:
– Θ: background edge probability
– αi: “expansiveness” – how extroverted is i?
– βi: “popularity” – how much do others want to be with i?
– ρi: “reciprocation” – how likely is i to respond to an incomping link with an outgoing one?
)Pr(log
)Pr(log
)Pr(log
)....Pr(log
ij
ijij
jiij
ij
ji
ji
ji
ji
ba
ba
Logistic-regression like procedure can be used to fit this to data from a graph
![Page 38: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/38.jpg)
Exponential Random Graph Model
• Basic idea:– Define some features of the graph (e.g., number
of edges, number of triangles, …)– Build a MaxEnt-style model based on these
features
![Page 39: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/39.jpg)
Latent Space Model
• Each node i has a latent position in Euclidean space, z(i)
• z(i)’s drawn from a mixture of Gaussians• Probability of interaction between i and j
depend on the distance between z(i) and z(j)• Inference is a little more complicated…
[Handcock & Raftery, 2007]
![Page 40: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/40.jpg)
![Page 41: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/41.jpg)
![Page 42: Analysis of Social Media MLD 10-802, LTI 11-772](https://reader035.vdocuments.site/reader035/viewer/2022062811/568161d0550346895dd1c518/html5/thumbnails/42.jpg)
Outline
• Stochastic block models & inference question• Review of text models
– Mixture of multinomials & EM– LDA and Gibbs (or variational EM)
• Block models and inference• Mixed-membership block models• Multinomial block models and inference w/ Gibbs• Beastiary of other probabilistic graph models
– Latent-space models, exchangeable graphs, p1, ERGM