social media analysis 21 november 2019 with nlp michael miller...

Post on 24-Jun-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Social media analysiswith NLP

Michael Miller Yoder

21 November 2019

1

Overview

1. Motivation: language in social context

2

Overview

1. Motivation: language in social context

2. Examples of NLP approaches to modeling identity

3

Overview

1. Motivation: language in social context

2. Examples of NLP approaches to modeling identity

Effects of self-presentation on interactionin social media

4

Experiment 1

Overview

1. Motivation: language in social context

2. Examples of NLP approaches to modeling identity

Effects of self-presentation on interactionin social media

Portrayal of characters and relationshipsin narrative (fanfiction)

5

Experiment 1

Experiment 2

language embedded in social context

6

What types of social contexts is language used in?

7

What types of social contexts?

8

9

10

11

12

For NLP, what is language?

13

14

1990 2000 2010

statistical machine learning NLP

Penn Treebank

1987-1989

15

news

16

news1987-1989

17

1990 2000 2010

statistical machine learning NLP neural NLP

Penn Treebank

1987-1989

BERT

18

19

SOCIAL20

language

speakers audience

situations purposes

21

Penn Treebank

1987-1989

credit: Amir Zeldes, [Zeldes & Simonson 2016]

Typical rates in the secondary market : 8.65 % one month ; 8.65 % three months ; 8.55 % six months. BANKERS ACCEPTANCES : 8.52 % 30 days ; 8.37 % 60 days ; 8.15 % 90 days ; 7.98 % 120 days ; 7.92 % 150 days ; 7.80 % 180 days.

22

language is always embedded in social context

23

“Language is by and about people”

—Noah Smith, ACL 2017

https://homes.cs.washington.edu/~nasmith/slides/acl-8-1-17.pdf

NLP + social science: applications

24

hate speech detection community norms

NLP + social science: applications

25

fairness and bias

Garg et al. 2017

media framing

https://criticalmediareview.wordpress.com/2015/10/19/what-is-media-framing/

NLP + social science: applications

26

dialectal NLP tools

Garg et al. 2017www.tes.com

Overview

1. Motivation: language in social context

2. Examples of NLP approaches to modeling identity

Effects of self-presentation on interactionin social media

Portrayal of characters and relationshipsin narrative (fanfiction)

27

Experiment 1

Experiment 2

28

29

30

Models of identity

identity

31

Critical identity approaches

“identity is the product rather than the source of linguistic and other semiotic practices … is social and cultural rather than primarily internal”

sociolinguistics

[Bucholtz and Hall 2005]

32

identity

Critical identity approaches

“identity is the product rather than the source of linguistic and other semiotic practices … is social and cultural rather than primarily internal”

sociolinguistics

[Bucholtz and Hall 2005]

33

identity

society, culture

Critical identity approaches

“As a shifting and contextual phenomenon, gender does not denote a substantive being”

gender studies

[Butler 1990]

34

Critical identity approaches35

(changing) identity

“As a shifting and contextual phenomenon, gender does not denote a substantive being”

gender studies

[Butler 1990]

society, culture

Critical identity approaches

“race and sex become grounded in experiences that actually represent only a subset of a much more complex phenomenon.”

critical race theory

[Crenshaw 1989]

36

(intersectional)identity

Critical identity approaches

“people have multiple identities connected not to their ‘internal states’ but to their performances in society”

discourse analysis

[Gee 2000]

37

identities

Computational identity approaches

“classify latent user attributes, including gender, age, regional origin, and political orientation solely from Twitter user language”

computer science

[Rao et al. 2010]

38

identity

Computational identity approaches

“Inferring latent attributes of online users has many applications in public health, politics, and marketing”

computational linguistics

[Ardehaly and Culotta 2015]

39

identity

“a [deep neural network] can be used to identify sexual orientation from facial images”

computer vision

[Kosinski and Wang 2018]

40

identity

Computational identity approaches

Can we investigate the production of identity in language with computational models?

41

Avoid naturalizing structures of identity and further marginalizing those who don’t fit them (Butler 1990)

Discover how notions of identity are being reinforced/challenged/reinvented

42

?language + social

data y = f(x)

machine learning

1. Self-presentation effects on social media

43

Qinlan ShenCMU Language Technologies Institute

Alex CodaCMU Language Technologies Institute

Carolyn P. RoséCMU Language Technologies Institute

Yunseok JangU Michigan Computer Science & Eng

Yale SongMicrosoft Research

Kapil ThadaniYahoo Research

Explicit identity positioning

● Working identity definition: “social positioning of self and other” [Bucholtz & Hall 2010]

● How does the social positioning of self affect interaction on social media?

● Tumblr as a site with particular identity implications, as well as social interaction

44

45

46

Lyca / 25

Self-presentation on Tumblr

47

● Explicit social positioning: blog descriptions!

● Well these are messy

● "List descriptions"○ max | 18yo | she/they | girl with dreams | twerfs don't

follow○ andre | 22 | he/him | mexican ✨trans | too many

fandoms ○ hey! annie, she/hers, love me, infj

What effects of similarities and differences in self-positioning do we see on content propagation

in Tumblr?

48

What effects of similarities and differences in self-positioning do we see on content propagation

in Tumblr?

49

blog descriptions reblogging

Reblog prediction

● Reblog "opportunity"

50

follower

followee

post

followee

postsimilar time

Reblog prediction

● Reblog "opportunity"

● Learning to rank pairwise formulation follower

followee

post

51

followee

post

reblog

similar time

Reblog prediction

● Reblog "opportunity"

● Learning to rank pairwise formulation she/her

25 | nyc

post

52

reylo fan

post

reblog

similar time

Levels of identity abstraction

● Identity categories: dimensions of personal characteristics○ age, gender, personality type

● Identity labels:○ 17, trans man, infj

53

Identity category extraction

● Manually grouped most popular common n-grams into 11 categories

● Refined list with manual annotation of 1000 blog descriptions

● Regular expressions to extract features such as "girl", "ravenclaw", "25" to represent users

Identity category

age

ethnicity/nationality

fandoms

gender

interests

location

personality type

pronouns

relationship status

sexual orientation

zodiac sign54

Data

● Sampled 1000 users who have blog descriptions and minimum 10 reblogs

● Pair each reblog with up to 5 posts not reblogged, posted within 30 minutes of the paired reblog

Number of sampled users 1000

Total reblog opportunities 712,670

Timeframe June - Nov 201855

Features● Baseline features:

○ Post hashtags○ Number of likes, reblogs, comments○ Post type (text, photo, quote, video, audio, chat, link, answer)

● Category alignment features:○ Category match○ Category mismatch: one user provides the category, the other does not

● Label alignment features:○ Label match○ Label mismatch○ Specific label interaction count

56

Is there an effect?

57

What is the nature of this effect?

● Generally positive coefficients were learned for category and label match features, negative for mismatches

● Specific interaction features between labels sometimes most informative

58

What is the nature of this effect?

59

Features Likelihood of reblogging

Follower: presents pronounsFollowee: does not

Race/ethnicity label alignment ↑

Nationality label alignment none

Follower: cisgender Followee: cisgender

What is the nature of this effect?

60

Features Likelihood of reblogging

Similar ages (20 and 21, e.g.) ↑

Follower: animeFollowee: design

Follower: gamingFollowee: manga

Follower: memes Followee: history

Conclusion

● Evidence for an association between explicit, self-presented identity information and content propagation

○ Most studies use only content and network features to predict content propagation [Naveed et al. 2011, Zhang et al. 2016,

Vosoughi et al. 2018]

● Users who presented labels that indicated shared interests or shared values were more likely to share each other’s content

61

2. Changes in portrayal of characters in narrative

62

Qinlan Shen

Luke Breitfeller

Carolyn P. Rosé

James Fiacco

Shefali GargEthan Xuanyue Yang

Huiming JinHariharan Muralidharan

Motivation

● Examine how others’ identity is positioned in narrative

● Can computational models capture basic changes in narrative portrayal of characters’ identity?

● Fanfiction: fiction created by fans of TV shows, movies, books, comics, etc

63

[Discourse Processes, in submission]

64

Can we capture changes in character and relationship framing in fanfiction

with word embedding-based methods?

65

66

● Word embeddings [Mikolov et al. 2013a] for social questions○ Stereotypes and bias in corpora [Garg et al. 2018]

○ Framing by different social groups [An et al. 2018)]

● Can word embeddings capture social framing of relationships in fanfiction?

Methods

67

1. Focusing on text that is relevant to characterization provides a stronger signal for learning shifts in relationship portrayal

2. Differences between canon and fanfiction vector representations in embedding space can represent changes in relationship portrayal

Hypotheses

Data

68

Harry Potter stories Archive of Our Own

>179k stories (as of 2018)

Characters

● Harry Potter● Hermione

Granger● Draco Malfoy● Ron Weasley● Ginny

Weasley

Pairings by popularity

● Draco/Harry● Hermione/Ron● Draco/Hermione● Ginny/Harry● Harry/Hermione● Harry/Ron

Prediction task

69

● Does the relationship match canon in being romantic/not romantic?

● True if

○ romantic in canon and romantic in fanfiction or

○ not romantic in canon and not romantic in fanfiction

Text extraction

70

github.com/michaelmilleryoder/fanfiction-nlpBased on BookNLP [Bamman et al. 2014]

Relationship representations

71

Harry wept at the sight of Hermione in the garden.

Ron looked down at his shoe. Troll bogeys. He would have to tell Harry about this.

Harry Hermione Harry Ron

● Weighted average of word embeddings in a 10-word window around character name mentions

72

Visualization

● Track changes in contextualized embeddings for character names across fics

○ Train RNN-based language model and take final hidden state as contextualized word representation [Peters et al. 2018]

73

Visualization

Hermione sat in the front of the classroom. She...

Fleur whistled softly. "Hermione! Come here...

[ 0.34 0.72 0.21 … ]

[ 0.89 0.06 0.53 … ]

74

75

76

Canon vector is close to the center of the fanfiction vectors: harry

Canon vector is on the edge of fanfiction vectors: draco, remus, sirius

Conclusion

77

● Word embedding approaches can capture types of character framing

○ See evidence of differences in characterization, relationships

● Differences often match known fanfiction trends

Conclusion

78

Computational models of identity in language

● Assumption: identity is not only reflected, but also constructed, in language

● Computational techniques to analyze and model the presentation of identity in discourse

79

Computational models of identity in language

● Shift focus from predicting latent user attributes from language to exploring how people are positioning themselves and others in language

● Enables exploring the effects of the choice of self-presentation (Experiment 1)

● Acknowledges that identities can be framed and represented in varied, changing ways in narrative (Experiment 2) 80

language embedded in social context

81

Thank you!

82

draco canon vector is on the edge of fanfiction vectors

83

Representation for Ron

draco canon vector is on the edge of fanfiction vectors

84

Representation for Ron

Differences when cast in a canon relationship vs. when excluded

Data

85

● For each character pairing, sampled stories with at least 5 paragraphs with both characters mentioned

● Balanced dataset across 6 pairings

● Each instance is a particular pairing in a story

Interaction on Tumblr

● How does the social positioning of self affect interaction on social media?

● Primary form of interaction on Tumblr: "reblogging" [Xu et al.

2014]

● Reblogging as content propagation; most studies use only content and network features to predict content propagation [Naveed et al. 2011, Zhang et al. 2016, Vosoughi et al. 2018]

86

Identity category annotation

87

Prediction tasks

88

● Canon: does the relationship match canon in being romantic/not romantic?

● Auxiliary tasks to test if simply capturing something else

○ Romantic: is the relationship romantic?

○ M/M: is the relationship between 2 males? (Regardless of whether it's romantic.)

top related