5cm probabilistic models in political...

52
Probabilistic Models in Political Science Pablo Barber´ a Center for Data Science New York University www.pablobarbera.com

Upload: votram

Post on 29-Aug-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Probabilistic Models in Political Science

Pablo Barber

´

a

Center for Data ScienceNew York University

www.pablobarbera.com

Page 2: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in
Page 3: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in
Page 4: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

4 / 54

Page 5: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

5 / 54

Page 6: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Two approaches to the study of social media and politics:

1. How social media platforms transform politicalcommunication

. Are social media creating ideological “echo

chambers”?

2. Social media as digital traces of political behavior. Can we infer latent individual traits (e.g. political

ideology) from online ties (follows, likes...)?

6 / 54

Page 7: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Inferring political ideology using Twitter data

I Two common patterns about social behavior:1. Homophily: clustering in social networks along common

traits (“birds of a feather tweet together”)2. Selective exposure: preference for information that

reinforces current views and for avoiding opinionchallenges.

I Social media networks replicate offline networks.I

Key assumption: individuals prefer to follow politicalaccounts they perceive to be ideologically close.

I These decisions contain information about allocation ofscarce resource (attention).

I Use this information to estimate ideological locations ofpoliticians and individuals on the latent same scale.

7 / 54

Page 8: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

● ●●●

●●

●●

●●

● ●

Political Accounts

NYTimeskrugman

senrobportman

maddow

FiveThirtyEight

HRC

WhiteHouseBarackObama

Bar

ackO

bam

a

Whi

teH

ouse

GO

P

mad

dow

FoxN

ews

HR

C

...

pol.

acco

unt

m

ryanpetrik 1 1 0 1 0 1 . . .user 2 0 0 1 0 1 0 . . .user 3 0 0 1 0 1 0 . . .user 4 1 1 0 0 0 1 . . .user 5 0 1 0 0 0 1 . . .

. . .user n 0 1 1 0 0 0 . . .

8 / 54

Page 9: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Spatial following model

I Users’ and politicians’ ideology (✓i

and �j

) are defined aslatent variables to be estimated.

I Data: “following” decisions, a matrix of binary choices (Yij

).I Spatial following model: for n users, indexed by i , and m

political accounts, indexed by j :

P(yij

= 1|↵j

, �i

, �, ✓i

, �j

) = logit�1⇣↵

j

+ �i

� �(✓i

� �j

)2⌘

where:↵

j

measures popularity of politician j

�i

measures political interest of user i

� is a normalizing constant

More

9 / 54

Page 10: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Intuition of the model

Probability that Twitter user i follows politician j , as a function ofthe user’s ideology:

φj1 = −1.51 αj1 = 3.51φj2 = 1.09 αj2 = 2.59

−2 0 2θi, Ideology of Twitter user i

Pr(y

ij=

1)

10 / 54

Page 11: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Estimation

I Goal of learning:I ✓

i

: ideological positions of users i = 1, . . . , n

I �j

: ideological positions of political accounts j = 1, . . . , m

I Likelihood function:

p(y|✓,�, ↵,�, �) =nY

i=1

mY

j=1

logit�1(⇡ij

)y

ij (1 � logit�1(⇡ij

))1�y

ij

where ⇡ij

= ↵j

+ �i

� �(✓i

� �j

)2

I Exact inference is intractable ! MCMC (approx. inference)I Estimation:

I First stage: HMC in Stan with random sample of Y to computeposterior distribution of j-indexed parameters.

I Second stage: parallelized MH in R for rest of i-indexedparameters (assuming independence), on NYU’s HPC.

11 / 54

Page 12: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Data

Im = list of 620 popular political accounts in the U.S.! Legislators, president, candidates, other political figures,

media outlets, journalists, interest groups. . .

In = followers of at least one of these accounts! 30.8M users (⇠75% of U.S. users)! 100K of these were matched with voter files

I States: AK, CA, FL, OH, PA.I Unique, perfect matches on first and last name, and county.

I Code:I Method: github.com/pablobarbera/twitter ideologyI Applications: github.com/SMAPPNYU/echo chambersI Data collection: streamR, Rfacebook packages for R

(available on CRAN)I Data analysis: github.com/pablobarbera/pytwools (python)

12 / 54

Page 13: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Results

@sensanders@HillaryClinton

@nancypelosi@VP

@BarackObamaMedian House DMedian Senate D

@senjohnmccainMedian Senate R

Median House R@sentedcruz

@motherjones@maddow

@NPR@msnbc

@nytimes@cnnbrk

@washingtonpost@FoxNews

@DRUDGE_REPORT@glennbeck

@limbaugh

@HRC@glaad

@OccupyWallSt@dailykos

@aclu@hrw

@BrookingsInst@RANDCorporation

@CatoInstitute@AEI

@Heritage@nra

@redstatePolitical Actors Media Interest Groups

−1.5 0.0 1.5 −1.5 0.0 1.5 −1.5 0.0 1.5Position on latent ideological scale

13 / 54

Page 14: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Validation

This method is able to correctly classify and scale Twitter userson the left-right dimension:

1. Political accountsI Correlation with measures based on roll-call votes.

2. Ordinary citizensI Individual and aggregate-level survey responsesI Voting registration files

It is also able to predict change over time.

14 / 54

Page 15: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Political elites

Ideal Points of Members of the 113th U.S. Congress

House Senate

●●

●●

●●

●● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

ρD = 0.63

ρR = 0.46

●●

●●

●●

●●

●●

●●

●●

●●

●●

ρD = 0.66

ρR = 0.63

−2

−1

0

1

2

−2 −1 0 1 2 −2 −1 0 1 2Estimated Twitter Ideal Points

Ideo

logy

Est

imat

es B

ased

on

Roll−

Cal

l Vot

es(S

imon

Jack

man

's id

eal p

oint

est

imat

es)

15 / 54

Page 16: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Ordinary Users

Comparison with ideology estimates from aggregated surveys(Lax and Phillips, 2012; Tausanovitch and Warshaw, 2013)

AL

AZ

AR

CA

CO

CTDE

FL

GA

ID

IL

IN

IA

KS

KY

LA

ME

MD

MA

MI

MN

MS

MO

MT

NE

NV

NH

NJNM

NY

NC

ND

OH

OK

OR

PA

RI

SCSD

TN

TX

UT

VT

VA

WA

WV

WI

WY

ρ = −0.916

40%

45%

50%

55%

−0.4 −0.2 0.0 0.2 0.4 0.6Ideology of Median Twitter User in Each State

Mea

n Li

bera

l Opi

nion

(Lax

and

Phi

llips

, 201

2) ρ = 0.791

●●

● ●

● ●

● ●

●●

●●

−1.0

−0.5

0.0

0.5

−0.6 −0.4 −0.2 0.0 0.2 0.4Ideology of Median Twitter User in Each City

Publ

ic P

refe

renc

e Es

timat

e(T

ausa

novi

tch

and

War

shaw

, 201

3)

16 / 54

Page 17: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Ordinary Users

Republicans are more conservative than Democrats

●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Arkansas California Florida Ohio Pennsylvania

−2

−1

0

1

2

Dem Rep Dem Rep Dem Rep Dem Rep Dem RepParty Registration

θ i, T

witt

er−B

ased

Ideo

logy

Est

imat

es

Predictive accuracy for party affiliation is 83%

17 / 54

Page 18: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Application: Ideology of Presidential Candidates

@washingtonpost@nytimes

@msnbc@JimWebbUSA

@HillaryClinton@POTUS

@LincolnChafee@MotherJones

@GovernorOMalley@SenSanders

@SenWarren

@tedcruz@RealBenCarson

@ScottWalker@RandPaul@rushlimbaugh

@BobbyJindal@GovernorPerry

@GovernorPataki@JohnKasich@marcorubio@DRUDGE_REPORT@RepPaulRyan

@GrahamBlog@GovMikeHuckabee

@JebBush@RickSantorum

@FoxNews@GovChristie@CarlyFiorina

@realDonaldTrump@WSJ

Aver

age

Rep

ublic

anin

114

th C

ongr

ess

Aver

age

Twitt

er U

ser

Aver

age

Dem

ocra

tin

114

th C

ongr

ess

−2 −1 0 1 2Position on latent ideological scale

Twitter ideology scores of potential Democratic and Republican presidential primary candidates

Barbera “Who is the most conservative Republican candidate forpresident?” The Washington Post, June 16 2015

19 / 54

Page 19: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Application: Twitter as an Ideological Echo Chamber?Tweets mentioning Obama Tweets mentioning Romney

0

50K

100K

−2 0 2 −2 0 2Estimated Ideology

Co

un

t o

f S

en

t Tw

ee

ts

Obama Romney

−2

−1

0

1

2

−2 −1 0 1 2 −2 −1 0 1 2Estimated Ideology of Author

Est

ima

ted

Id

eo

log

y o

f R

etw

ee

ter

0.00%

0.25%

0.50%

0.75%

1.00%

1.25%

% of Tweets

Barbera (2015) “Birds of the Same Feather Tweet Together: BayesianIdeal Point Estimation Using Twitter Data.” Political Analysis

20 / 54

Page 20: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Application: Twitter as an Ideological Echo Chamber?

Barbera, Jost, Nagler, Tucker, & Bonneau (2015) “Tweeting From Leftto Right: Is Online Political Communication More Than an EchoChamber?” Psychological Science

21 / 54

Page 21: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Other applications

Ideology of media outletsIdeological AsymmetriesMultidimensional Policy Spaces

22 / 54

Page 22: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Two approaches to the study of social media and politics:

1. How social media platforms transform politicalcommunication

. As voters are able to directly interact with politicians,

does the quality of political representation improve?

Social media as digital traces of political behavior. Are legislators’ and citizens’ social media messages a

valid proxy for the attention they give to different political

issues?

23 / 54

Page 23: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Political Representation

Public Opinion Policy

24 / 54

Page 24: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Political Representation

Issues Voters Discuss Issues Legislators Discuss

Do Legislators Accurately Represent Voters’ Interests?Who Leads? Who Follows?

Barbera, Nagler, Egan, Bonneau, Jost, & Tucker (2014) “Leaders orFollowers? Measuring Political Responsiveness in the U.S. CongressUsing Social Media Data.” APSA Conference Paper.

25 / 54

Page 25: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Outline

1. Analyze tweets sent by Members of U.S. Congress and theirfollowers using topic modeling techniques.

2. Estimate the importance (frequency of discussion) of 100different issues in the revealed expressed political agenda forlegislators and constituents

3. Political Congruence: are Members of Congressdiscussing the same set of issues as their constituents?

4. Political Responsiveness: do topics discussed byMembers of Congress temporally precede or follow topicsdiscussed by the voters?

26 / 54

Page 26: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Data

651,116 tweets by Members of U.S. Congress, from Jan. 1, 2013 toDec. 31, 2014 (113th Congress), collected by the Social Media andPolitical Participation Lab (SMaPP) using Twitter’s Streaming API.

●●

● ●●

●●

●●

● ●●

● ●

● ●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●● ●

● ●●

● ●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ● ● ●

●●

● ●

●●

●●

● ●

●●

● ●

● ●

●● ● ●

●●

●● ●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

● ●

●●

● ●●

0

10

20

Feb−13 May−13 Aug−13 Nov−13 Feb−14 May−14 Aug−14 Nov−14 Feb−15

Twee

ts p

er W

eek

Membersof Congress

House DemocratsHouse Republicans

Senate DemocratsSenate Republicans

27 / 54

Page 27: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Citizens’ Tweets

Collected all tweets for 3 samples of citizens:1. Informed public:

I Followers of 5 major media outlets (CNN, FoxNews, MSNBC,NYT, WSJ) located in U.S. (filtered by time zone)

I Random sample of 10,000 (out of ⇠30M)2. Republican Party Supporters:

I Follow 3+ Rep MCs and no Dem MCsI Random sample of 10,000 (out of 203,140)

3. Democratic Party Supporters:I Follow 3+ Dem MCs and no Rep MCsI Random sample of 10,000 (out of 67,843)

28 / 54

Page 28: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Table : Number of tweets in dataset

Group N Avg. Min Max TweetsHouse Republicans 238 1,215 70 8,857 267,311House Democrats 207 1,177 113 5,993 222,491Senate Republicans 46 1,532 73 6,627 67,412Senate Democrats 56 1,616 150 10,736 87,307Informed Public 10K 948 2 5,861 9,487,382Rep. Supporters 10K 1,091 2 8,804 10,911,813Dem. Supporters 10K 1,306 2 5,122 13,058,947Period of analysis: January 1, 2013 to December 31, 2014.

29 / 54

Page 29: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Political Representation

Public Opinion Policy

Media

Media data:I 273,007 tweets from 36 largest media outlets in U.S. (print,

broadcast, online) over same period.

30 / 54

Page 30: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

From Tweets to Topics

4 steps in our analysis1. Tweets from Members of Congress are preprocessed and

split by day, party and chamber (N=2,920 documents)2. Latent Dirichlet Allocation (Blei, 2003):

I Each document is a mixture over K = 100 latent topics.I Topics are distributions over V = 75, 000 n-grams (up to

trigrams, selected by frequency; keeping hashtags)I Estimated parameters:

� Distribution of n-grams over topics (K ⇥ V )✓ Distribution of topics over documents (K ⇥ N)

3. Similar text processing for tweets from citizens and NYTtweets (split by day and group)

4. Using simulation, compute posterior distribution of ✓F

forobserved n-grams for citizens and media

31 / 54

Page 31: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Latent Dirichlet allocation (LDA)I

Topic models are powerful tools for exploring large data setsand for making inferences about the content of documents

!"#$%&'() *"+,#)

+"/,9#)1+.&),3&'(1"65%51

:5)2,'0("'1.&/,0,"'1-

.&/,0,"'12,'3$14$3,5)%1&(2,#)1

6$332,)%1

)+".()165)&65//1)"##&.1

65)7&(65//18""(65//1

- -

I Many applications in information retrieval, documentsummarization, and classification

Complexity+of+Inference+in+Latent+Dirichlet+Alloca6on+David+Sontag,+Daniel+Roy+(NYU,+Cambridge)+

W66+Topic+models+are+powerful+tools+for+exploring+large+data+sets+and+for+making+inferences+about+the+content+of+documents+

Documents+ Topics+poli6cs+.0100+

president+.0095+obama+.0090+

washington+.0085+religion+.0060+

Almost+all+uses+of+topic+models+(e.g.,+for+unsupervised+learning,+informa6on+retrieval,+classifica6on)+require+probabilis)c+inference:+

New+document+ What+is+this+document+about?+

Words+w1,+…,+wN+ ✓Distribu6on+of+topics+

�t =�

p(w | z = t)�

…+

religion+.0500+hindu+.0092+

judiasm+.0080+ethics+.0075+

buddhism+.0016+

sports+.0105+baseball+.0100+soccer+.0055+

basketball+.0050+football+.0045+

…+ …+

weather+ .50+finance+ .49+sports+ .01+

I LDA is one of the simplest and most widely used topicmodels

32 / 54

Page 32: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Latent Dirichlet Allocation

I Document = random mixture over latent topicsI Topic = distribution over n-grams

Probabilistic model with 3 steps:1. Choose ✓

i

⇠ Dirichlet(↵)

2. Choose �k

⇠ Dirichlet(�)

3. For each word in document i :I Choose a topic z

m

⇠ Multinomial(✓i

)I Choose a word w

im

⇠ Multinomial(�i,k=z

m

)

where:↵=parameter of Dirichlet prior on distribution of topics over docs.✓

i

=topic distribution for document i

�=parameter of Dirichlet prior on distribution of words over topics�

k

=word distribution for topic k

33 / 54

Page 33: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

EstimationI Applications that aggregate by author or day outperform

tweet-level analyses (Hong and Davidson, 2010)I K is fixed at 100 based on cross-validated model fit.

●●

● ● ●

●●

● ●● ● ● ● ●

Perplexity

logLikelihood

0.80

0.85

0.90

0.95

1.00

10 20 30 40 50 60 70 80 90 100 110 120Number of topics

Rat

io w

rt wo

rst v

alue

I Text is parsed with scikit-learn in pythonI Estimation: Collapsed Gibbs Sampler in C++ (Griffits and

Steyvers, 2004), ported to R by Grun and Hornik (2011)34 / 54

Page 34: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Validation

j.mp/lda-congress-demo

35 / 54

Page 35: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Congruence

Are Members of Congress discussing the same set of issues astheir constituents?

Table : Contemporaneous Pearson Correlations in Topic Distribution

Dem RepGroup Mcs MCsDemocratic Members of Congress 1.00 0.22Republican Members of Congress 0.22 1.00Informed Public 0.33 0.39Republican Party Supporters 0.17 0.62Democratic Party Supporters 0.58 0.33Media 0.39 0.61

36 / 54

Page 36: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Responsiveness

Do legislators influence the public? Does the public influencelegislators?

To explore causal relationships between topic distributions, weuse a Granger-causality framework (Granger, 1969):I Regress proportion of tweets on topic k at time t by each

group on lagged proportions for all groups, using five lags.I Do legislators’ tweets predict tweets by the public, controlling

for the media, and vice versa?! Changes in tweets as proxies for changes in salience of

issues

37 / 54

Page 37: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Results: Democratic legislators

Public Legislators0.09

<0.00

Media

0.01

0.25

0.09

0.04

38 / 54

Page 38: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Results: Republican legislators

Public Legislators0.03

<0.00

Media

0.01

0.25

0.11

0.07

39 / 54

Page 39: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Conclusions

1. Social media as variable2. Social media as data

Future work / open questions:I More complex generative models for tweets that exploit

platform features (Author-Topic; Dynamic; Hierarchical)I Text- vs network-based estimates of political ideologyI Predicting latent probability to turn out to vote based on

tweet text, using voting registration recordsI Multilingual topic modelingI Detecting irony and sarcasm (Trump!)I Identifying bots and spam with user and text features only

40 / 54

Page 40: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Thanks!

website: pablobarbera.com

twitter: @p barbera

github: pablobarbera

41 / 54

Page 41: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Backup slides (index)

Model with covariatesModel identificationUnequal representationComparative responsiveness

42 / 54

Page 42: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Model with Covariates

Baseline model:

P(yij

= 1) = logit�1⇣↵

j

+ �i

� �(✓i

� �j

)2⌘

Model with geographic covariate:

P(yij

= 1) = logit�1⇣↵

j

+ �i

� �(✓i

� �j

)2 + �s

ij

where s

ij

= 1 if user i and political actor j are located in thesame state, and s

ij

= 0 otherwise.

� ⇡ 1.20 and � ⇡ 0.90

43 / 54

Page 43: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Model with CovariatesComparing Parameter Estimates Across Different Model

Specifications

φj, Elites' Ideology Estimates θi, Users' Ideology Estimates

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−2

−1

0

1

2

−2 −1 0 1 2 −2 −1 0 1 2Parameter Estimates (Model with Geographic Covariate)

Para

mete

r E

stim

ate

s (B

ase

line M

odel)

Index

44 / 54

Page 44: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Identification

P(yijt

= 1) = logit�1⇣↵

j

+ �i

� �(✓it

� �j

)2⌘

Additive aliasing:

= logit�1⇣(↵

j

+ k) + (�i

� k) � �(✓it

� �j

)2⌘

= logit�1⇣↵

j

+ �i

� �((✓it

+ k) � (�j

� k))2⌘

Multiplicative aliasing:

= logit�1⇣↵

j

+ �i

� �

k

2 ((✓it

� �j

) ⇥ k)2⌘

Index

45 / 54

Page 45: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Identifying restrictions

Indeterminacy Approach 1 Approach 2Additive aliasing (1) Fix ↵0

j

= 0 or �0i

= 0 Fix µ↵ = 0 or µ� = 0Additive aliasing (2) Fix �0

j

= +1 or ✓0i

= +1 Fix µ� = 0 or µ✓ = 0Multiplicative aliasing Fix �00

j

= �1 or ✓00i

= �1 Fix �� = 1 or �✓ = 1

phi theta

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

−2.5

0.0

2.5

−2.5

0.0

2.5

Ap

pro

ach

1A

pp

roa

ch 2

−2.5 0.0 2.5 −2.5 0.0 2.5Estimated value of parameter

True v

alu

e o

f para

mete

r

Index46 / 54

Page 46: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Application: Ideology of Media Outlets and Journalists●

Mother JonesNew YorkerMSNBCThink ProgressDaily Kos

SlateAl JazeeraNPR NewsThe AtlanticTIME

New York TimesFiveThirtyEightLos Angeles TimesSalon

Huffington PostNewsweek

ABC NewsCBS News

U.S. NewsCNNChristian Science MonitorCNBCWashington PostUSA TodayFinancial Times

The EconomistReuters

Wall Street JournalForbesPolitico

The HillWashington Times

Fox NewsDrudge ReportReal Clear Politics

Red State

−2 −1 0 1Estimated Ideological Ideal Point

(Accounts Weighted by Number of Followers)

Barbera & Sood (2014) “Follow Your Ideology: A Measure ofIdeological Location of Media Sources”, MPSA Conference

Index

47 / 54

Page 47: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Application: Ideology of Media Outlets and Journalists

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●● ●

●●

●● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●● ●●

●●

●●

● ●

●●

●●

● ●

●●

●●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●●● ●

● ●●

●● ●

●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●●● ●●

●●

●●

● ●●

●● ●

●●

●●●

●●

●●

●● ●

●● ●

●● ●

●●

●●

●● ●●

●● ●

●●

●●●●

●● ●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●● ●

●●

● ● ●

●●

●●

●●

● ●●●

●●

● ●

Fox NewsUSA Today

Wall Street JournalCBS NewsU.S. NewsABC News

CNNABC

Washington PostNBC News

Los Angeles TimesTIME

Time MagazineNew York Times

NPR NewsMSNBC

−1 0 1 2φj, Estimated Ideological Ideal Points

Barbera & Sood (2014) “Follow Your Ideology: A Measure ofIdeological Location of Media Sources”, MPSA Conference

Index

47 / 54

Page 48: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Application: Ideological Asymmetries in Pol. Comm.

Super Bowl

Newtown Shooting

Oscars 2014

Syria

Winter Olympics

Boston Marathon

Minimum Wage

Marriage Equality

Budget

Gov. Shutdown

State of the Union

2012 Election

0.00 0.25 0.50 0.75 1.00Estimated Rate of Cross−Ideological Retweeting

(Exponentiated Coefficient from Poisson Regression)

● LiberalsConservatives

Barbera, Jost, Nagler, Tucker, & Bonneau (2015) “Tweeting From Leftto Right: Is Online Political Communication More Than an EchoChamber?” Psychological Science

Index

48 / 54

Page 49: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Application: Multidimensional Policy Spaces in Europe

P(yij

= 1) = logit

�1

i

+ �j

�dX

k=1

�d

(✓ik

� �jk

)2

!

Estimated ideological positions for 120 parties in 28 European countriesLeft-Right Dimension Pro/Anti-European Union Dimension

Grunen

AN0 2011

FPO

OVPSPO

FDL

ERC

C

CD&V

CDA

CDU−CSU

KD

CU

Cs

ODS

DL

PO

CpE

Syriza

Cons

DPP

DIMAR

SLD PD

D66Ecol.

Reform

Verts

Fine Gael

Centre

M5SLNNK

FDP

FI

GD

ZZA

GL

Green

Green U

ANEL

Labour

PL

PvdALab

PiS

SEL

FG

V

Left

Lib

LibDem

FPVenstre

Mod.Coalition

FN

PN

NEOS

ND

N−VANSiLN

VLD

Sinn Fein

PASOK

PVV

PC

VVD

Piraten

Podemos PSL

PP

IRL

MR

Reform

SNP

SDS

CDS

SAPSocDem

SPD

SD

PS

SP

PS

sp.a

SF

Fianna Fail

PSOE

SD

Linke

Finns

UPM

UCD

UDIUPyD

UKIP

IU

SP

Unity

r=0.81

0.0

2.5

5.0

7.5

10.0

0.0 2.5 5.0 7.5 10.0Ideology estimates (Surveys)

Ideo

logy

esti

mat

es (T

witt

er)

Grunen

AN0 2011

FPO

OVP

SPO

FDL

ERC

C

CD&VCDA

KDCU

Cs

ODS

DL

POSyriza

Cons

DPP

DIMAR

SLD

PD D66

Ecol.

Reform

VertsFine Gael

Centre

M5S

LNNK

FDP

FI

GD

ZZAGLGreen

Green UANEL

Labour

PL PvdA

Lab

PiSSEL

FGV Left

LibLibDem

FP

Venstre

Mod.

Coalition

FN

PNNEOSND

N−VA

LN

VLD

Sinn Fein

PASOK

PVV

PC

VVD

Piraten

PodemosPSL

PPIRL

MRReformSNP

SDS CDS

SAPSocDem

SPD

SDSPPS

sp.a

SFFianna Fail

PSOE

SD

Linke

Finns

UPMUCDUPyD

UKIP

IU

SP

Unity

r=0.77

0.0

2.5

5.0

7.5

10.0

0.0 2.5 5.0 7.5 10.0Ideology estimates (Surveys)

Ideo

logy

esti

mat

es (T

witt

er)

Barbera, Popa, & Schmitt (2015) “Analyzing the CommonMultidimensional Political Space for Voters, Parties, and Legislators inEurope”, MPSA Conference

Index

49 / 54

Page 50: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Unequal representation

We also analyze whether correspondence between citizensand legislators is higher for:I Co-partisans (party supporters)I Issues owned by each party (e.g. economy for Republicans;

social issues for Democrats)I Constituents (vs general public)I Informed public vs random sample of U.S. Twitter usersI Individuals with income above median

50 / 54

Page 51: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Electoral Institutions and Political Representation

What institutional configurations foster better representation?

Theoretical expectationsCountry Government Instit. Congr. Responsiv.Germany Coalition Prop. High LowSpain Single-party Prop. Medium MediumUK Coalition Maj. Medium MediumFrance Single-party Maj. Low High

Barbera & Bølstad (2015) “A Comparative Study of the Qualityof Political Representation Using Social Media Data”, EPSAConference Paper. Index

51 / 54

Page 52: 5cm Probabilistic Models in Political Sciencepeople.csail.mit.edu/.../slides/lecture2_barbera-case-study.pdf · Probabilistic Models in Political Science ... I First stage: HMC in

Electoral Institutions and Political Representation

j.mp/EPSA-lda-demoIndex

52 / 54