i want to know more about compuerized text analysis
TRANSCRIPT
![Page 1: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/1.jpg)
Computerized Text AnalysisThe Practical and Ethical Use of Social Media Data For Social Science Research
Lucas CzarneckiUniversity of Calgary Dept. of Political Science
![Page 2: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/2.jpg)
My Research
Psychological differences between conservatives and liberals
How innate traits like personality, temperament, and moral impulsivities shape ideology
“Do differences in ideology manifest in language use”?
photo by: Luci Gutiérrez
![Page 3: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/3.jpg)
Before I even get started…
Go to www.eventbrite.ca
Using “Find Your Next Experience” search for:
“I Want To Know More About…Computerized Text Analysis”
The Link will take you to:
![Page 4: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/4.jpg)
Please Download…
You will need:
Optional, but highly recommended:
![Page 5: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/5.jpg)
My Two Goals for this Workshop
One Goal
Is to provide practical information (regarding data collection, preprocessing,
and analysis)
Second Goal
To have a wider discussion on text and social media data (the history, and the current state of affairs)
Common Theme Throughout: The Practical and Ethical Use of Social Media Data
![Page 6: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/6.jpg)
What Is Computerized Text Analysis?
Not one method or approach, but many
A Swiss Army Knife for studying language use and text data
any automatized process for categorizing or uncovering latent meaning in word use within or across files with computer-readable formats
![Page 7: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/7.jpg)
History of Text Analysis
1901The Idea has roots to the
earliest days of psychology (e.g. Freud &
Psycholinguistics)
1921Rorschach
Projective TestsLinked Word Use With Psychological Drives
1950sEarly Work in Content
Analysis (still using human coders and
judges)
1966-1978Phillip Stone et al. work
on The First Computerized Text Analysis Program
1990sPrograms like General
Inquirer become available on small
personal computers
1992-1994Earliest work on
Linguistic Inquiry and Word Count
![Page 8: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/8.jpg)
Computerized Text Analysis: Why Now?
“We are in the midst of a technological revolution whereby, for the first time, researcherscan link daily word use to a broad array of real-world behaviors... to detect meaning in a wide
variety of experimental settings, including to show attentional focus, emotionality, social relationships, thinking styles, and individual differences.”
Tausczik & Pennebaker & (2007)
![Page 9: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/9.jpg)
ENIAC (1943) – The World’s First* Computer
occupied ~ 1,800 square feet weighed almost 50 tons
…and no pictures of cats
ENIAC – was constructed by the University of Pennsylvania: construction began in 1943…completed in 1946.
![Page 10: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/10.jpg)
IBM’s 1956 5MB Hard-Drive
50 24-inch discs stacked together Took up 16 sq. ft.
Cost IBM ~$35,000 annually
Stored an impressive 5MB of data.
weighed just less than 1 ton…now that’s progress!
![Page 11: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/11.jpg)
A Technological Revolution
Computers are becoming increasingly powerful, while simultaneously less expensive
Graph by: Max Roser – Our World in Data
![Page 12: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/12.jpg)
An Information Revolution
Source: BBVA API-Market
![Page 13: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/13.jpg)
An Information Revolution
Vids/Pics aside… a lot (if notmost) of this is text data
Source: BBVA API-Market
![Page 14: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/14.jpg)
The untapped world of unstructured data
The vast majority of data available online right now is unstructured: Hard to estimate, but about ~80 to 90% of all data online is unstructured. Much of this is text. Roughly 2.5 billion GBs of data are created per day
It is a very exciting time for computerized text analysis!
![Page 15: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/15.jpg)
Before we start data mining…we need to talk about Ethics
Our technology has outpaced our philosophy
In our CommunitiesIn Law and Government
…and in Academia
![Page 16: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/16.jpg)
In Law & Government – e.g. Surveillance Programs
Constitutions/Conventions Evolve Slowly…
…Technology does not
![Page 17: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/17.jpg)
Ethics aside…the NSA Shows us how much Data is really out there
East Germany’s Stasi vs. the NSA
https://apps.opendatacity.de/stasi-vs-nsa/english.html
![Page 18: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/18.jpg)
Academia – e.g. Facebook Emotion Manipulation Experiment (2014)
Universities Must Update Their EthicsIn Face of New Methods
![Page 19: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/19.jpg)
My Ethics Application
What my research involves…
Scrapped: Posts from Party Leaders (N=1,712)Harper = 525 | Trudeau = 531 | Mulcair = 656
Scrapped: Data on Facebook Users Comments = 297,830 | Likes = 3,296,035 | Shares = 855,381
…none of these people know they are in my study
The process of applying for ethics approval…
![Page 20: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/20.jpg)
The Rigorous Ethics Approval Process
Made my life easier
…but I should not be granted this luxury
![Page 21: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/21.jpg)
Getting Started: A Very, very, very brief Intro to R
![Page 22: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/22.jpg)
Let’s start with scalars
Enter: X <- 3
Print(display) your dataX
OrX = 4
R as a calculator, enter:X * 5
Save calculations as distinctly new objectsY <- X * 5
![Page 23: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/23.jpg)
Now for vectors:
Enter A <- c(2,4,6,8) R – Loves functions (e.g. concatenate, mean, etc)
Functions make life easier
Without functions…calculations would need to be done manually, ex:(2+4+6+8)/4
Or:mean(A)
![Page 24: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/24.jpg)
Vector of character strings!
This will give you an error list_of_names <- c(bob,jane,david,mark,john)
Instead;list_of_names <- c("bob","jane","david","mark","john")
list_of_candidates <- c("Justin Trudeau", "Stephen Harper", "Tom Mulcair", "Gilles Duceppe", "Elizabeth May")
![Page 25: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/25.jpg)
Humble beginnings
Let’s start with a vector of numbers, example:v <- c(39.5,31.9,19.7,4.7,3.4)
Assign names to elements, enter:names(v) <- c("Justin Trudeau", "Stephen Harper", "Tom Mulcair", "Gilles Duceppe", "Elizabeth May")
Print v to view data:v
Look up specific data:v["Stephen Harper"]
![Page 26: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/26.jpg)
Getting Familiar with Text Data
Sentences from Alice In Wonderland
Alice <- c("But I don't want to go among mad people, Alice remarked", "Oh you can't help that, said the Cat, we're all mad here. I'm mad.
You're mad", "How do you know I’m mad, said Alice","You must be, said the Cat, or you wouldn’t have come here")
Print:Alice
![Page 27: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/27.jpg)
FINALLY, Matrices:
Your first matrix:myMatrix <- matrix(data=c(1,2,3,4,5,6), ncol=3)myMatrix
Also works with text (very common): myMatrix <- matrix(data=c("you", "can", "also", "include", "words", "here"), ncol=3)myMatrix
![Page 28: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/28.jpg)
Functions and Packages
Packages are here to help!!!
install.packages(“tm”)require(tm) OR library(tm)
To Save time:Needed <- c("tm", "lsa", "Rfacebook", "twitteR", "ggplot2", "devtools", "wordcloud", "biclust", "cluster", "igraph", "fpc")lapply(Needed, require, character.only = TRUE)
![Page 29: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/29.jpg)
In a nutshell…why use R?
1) It is open source (freeware)2) It is incredibly flexible3) The R Community (R-bloggers, stack exchange, etc.)4) Increasingly popular in academia5) Multi-disciplinary 6) Constantly evolving features
![Page 30: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/30.jpg)
An Overview of Text As Data Methods
From: Justin Grimmer & Brandon Stewart’s (2013) “The Promises and Pitfalls of Automated Content Analysis For Political Texts”
Political Analysis, 21: 267-297
![Page 31: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/31.jpg)
Computerized Text Analysis: A How To Guide
Step One: Data Collection
![Page 32: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/32.jpg)
An Overview of Text As Data Methods
From: Justin Grimmer & Brandon Stewart’s (2013) “The Promises and Pitfalls of Automated Content Analysis For Political Texts”
Political Analysis, 21: 267-297
![Page 33: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/33.jpg)
Data Are Everywhere
1. Existing Corpora (e.g. Government Collections / University Archives) e.g.
I. POLTEXT – Université Laval II. Project GutenburgIII. Presidential Speech Archive – University of VirginiaIV. Natural Speech Corpora also available
2. Electronic Sources I. Web Scrapping II. Social MediaIII. Blogs
3. Undigitalized Text (e.g. Old manuscripts, treaties, debate proceedings, elections records)
Tip: Many R Packages (e.g. tm, quanteda, lsa, etc.) include corpora you can practice on
![Page 34: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/34.jpg)
Getting Started: Become a Facebook Developer
Google: “Facebook For Developers”Site: https://developers.facebook.com
Note: You will need a FB Account to sign in.
Your page After Registration
To Begin:
1) Click on “My Apps” (top right)
![Page 35: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/35.jpg)
Getting Started: Become a Facebook Developer
1. GOOGLE: “Facebook Developers” 2. Sign in to Facebook – you will need an account3. CLICK on “My Apps”4. Click “Create New App”
![Page 36: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/36.jpg)
Getting Started: Become a Facebook Developer
2) Create App ID
(there will be some pop up windows)
3) Enter security question
4) Choose Platform
5) WWW
6) Skip Quick Start
![Page 37: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/37.jpg)
Getting Started: Become a Facebook Developer
6) Click -> Skip Quick Start
7) Go to Dashboard (automatic)
![Page 38: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/38.jpg)
FROM DASHBOARD
Never share your App Secret!!!!!
8) Copy your App IDD AND your App Secret (the latter will req. you to enter your password)
9) Return to R / RStudio
![Page 39: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/39.jpg)
DASHBOARD / Settings / Basic
Never share your App Secret!!!!!
9) Click Add Platform
10) Select “Website”
11) Paste: http://localhost:1410/ Into Site URL
12) Save Changes
13) Go back to R – Hit Enter
![Page 40: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/40.jpg)
DASHBOARD / Settings / Basic
11) Paste: http://localhost:1410/ Into Site URL
12) Save Changes
13) Go back to R – Hit Enter
![Page 41: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/41.jpg)
That’s it. All ready to go!
<- If successful
Click Continue:
Web browser should display:
Authentication complete. Please close this page and return to R.
Authentication complete. Authentication successful.
![Page 42: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/42.jpg)
Getting Started: Become a Twitter Developer
Your page After Registration
To Begin:Go to: https://dev.twitter.com/
![Page 43: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/43.jpg)
Getting Started: Become a Twitter Developer
1. Click “My Apps”2. Then “Create New App”
You will come to this page
1. Give your app a name2. A description 3. For website type http://test.de/
4. Leave Callback URL blank5. Agree to conditions
6. Create Your Twitter App
![Page 44: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/44.jpg)
Getting Started: Become a Twitter Developer
In R/RStudio
consumer_key <- ‘Your_Consumer_Key’consumer_secret <- ‘Your_Consumer_Secret’access_token <- ‘Your_Token_Here’access_secret <- ‘Your Secret Here’
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)
![Page 45: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/45.jpg)
Now that you are a developer…What can you do?
A Short Summary:
Collect data by location – e.g. using Twitter’s “geotags” Collect data by topic – using keywords or hashtags Collect data on groups – using Facebook GroupIDs Select Data according to parameters such as time - e.g. Posts made during the
Canadian Election Campaign (4/Aug/2015 until 19/Oct/2015) Scrape timelines – e.g. President Obama’s @POTUS account Remove noise - preprocessing
![Page 46: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/46.jpg)
A Couple Quick Examples
![Page 47: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/47.jpg)
The Signal Through the NoiseStep Two: Preprocessing (Cleaning up our data)
![Page 48: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/48.jpg)
Preprocessing
“a series of operations on data by a computer in order to retrieve or transform or classify information” (Oxford Dictionary)
Operations that transform words into numbers for the purposes of future analysis
![Page 49: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/49.jpg)
An Overview of Text As Data Methods
From: Justin Grimmer & Brandon Stewart’s (2013) “The Promises and Pitfalls of Automated Content Analysis For Political Texts”
Political Analysis, 21: 267-297
![Page 50: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/50.jpg)
A few common procedures
tm_map(x, FUN,…) – from the “tm” package
docs <- tm_map(docs, removePunctuation)docs <- tm_map(docs, removeNumbers)
docs<- tm_map(docs, tolower)docs <- tm_map(docs, removeWords, myStopwords)
docs <- tm_map(docs, stripWhitespace)
docs <- tm_map(docs, PlainTextDocument)docs <- tm_map(docs, stemDocument)
docs <- tm_map(docs, removeWords, myStopwords)
![Page 51: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/51.jpg)
Stemming / Lemmatization
the process of reducing a word to its most basic form (Porter, 1980)
Party – Parties – Partying Parti*
Common Problem: University Students Partying Rather than Studying Political Parties
![Page 52: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/52.jpg)
A Quick Example
![Page 53: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/53.jpg)
Finding the Signal Through the Noise
Step Three: Analysis
![Page 54: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/54.jpg)
An Overview of Text As Data Methods
From: Justin Grimmer & Brandon Stewart’s (2013) “The Promises and Pitfalls of Automated Content Analysis For Political Texts”
Political Analysis, 21: 267-297
![Page 55: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/55.jpg)
Two Different Approaches
Word Count Strategy Top Down A Bag of Words Approach Relative Frequency of Words
Usually a percentage (target word / total words)
Word Pattern Analysis Bottom up Word association Word covariation
term-doc matrix
See: Pennebaker et al. (2003) “Psychological Aspects of Natural Language Use: Our Words, Our Selves. Annual Review of Psychology 54: 549-50.
![Page 56: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/56.jpg)
Latent Semantic AnalysisKnowledge Acquisition and A Theory on Meaning
Chomsky – language acquisition &a universal grammar
Vs.
Locke - The Blank Slate: “no innaterules for processing data”
A Solution to Plato’s Problem
A Must read: Landauer, T.K. & Dumais, S.T. (1997). “A Solution to Plato’s Problem: The Latent Semantic Anaysis Theory of Acquisition, Induction, and Representation of Knowledge.” Psychological Review, 104(2), 211-240.
![Page 57: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/57.jpg)
Word Count Strategy – Two Parts(e.g. Linguistic Inquiry and Word Count)
The Processing Component The Dictionary
Package “quanteda” – allows you to import LIWC dictionaries into R
![Page 58: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/58.jpg)
Word Category Examples Psychological Correlate(s)
First-Person Singular I, me, mine Honest, depressed,low status, personal,emotional, informal
First-Person Plural We, Us, Our Detached, high status,socially connected togroup (sometimes)
Third-person singular She, him, her, he Social interests, social support
Articles A, an, the interest in objects andthings, deference to authority
Negative emotion (e.g. anxiety & anger) Hate, angry, mad, worried, concerned Emotional state
Exclusivity But, without, exclude Cognitive complexity,honesty
Future/Past/Present Tense Will, gonna, am, doing, went, ran, had e.g. goal orientations (forward vs past focused)
Social processes Mate, talk, they, child Social concerns, social support
The Heart of Word Count Software – The Dictionary
Caution w/ Correlation
![Page 59: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/59.jpg)
For Fun
Quickly scrap your friends and colleagues’ twitter accounts
(makes use of LIWC’s dictionary)
![Page 60: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/60.jpg)
This is… Anxiety (avoidance-based) Overestimates risk Status-quo oriented (risk-avoidance choices) Risk reduction (concerned with uncertainty)
This is… Anger (approach-based) Underestimates risk Change oriented (risk-seeking choices) Moral anger (addresses injustices “no dessert!”)
LiberalsPersonality (Big 5):
Openness to New Experiences
Conservatives
Personality (Big 5):ContentiousnessNeuroticism
Predisposition != Determinism
![Page 61: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/61.jpg)
Tracking Sentiment Over Time
Frequency of Anger-related words from Facebook Commentators during the 2015 Canadian General Election Campaign
![Page 62: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/62.jpg)
Tracking Sentiment Over Time
Frequency of Positive Emotion words from Facebook Commentators during the 2015 Canadian General Election Campaign
![Page 63: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/63.jpg)
Creating Your Own Dictionaries
Word Category
%01 HarmVirtue02 HarmVice03 FairnessVirtue04 FairnessVice05 IngroupVirtue06 InGroupVice07 AuthorityVirtue08 AuthorityVice09 PurityVirtue10 PurityVice11 MoralityGeneral%
Target Words
compassion* 01empath* 01sympath* 01…class 07Bourgeoisie 07…austerity 09integrity 09 11
Data are organizedHierarchically
* Indicate a word stem
![Page 64: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/64.jpg)
Grimmer & Stewart (2013)
There is no single best method for computerized
text analysis
Four Principles of Automated Text Analysis
1) All quantitative Models of language are wrong – but some are useful
Language models are inherently reductionist2) Quantitative methods for text amplify resources & augment humans
Computers cannot replace humans(…yet)3) There is no globally best method for automated text analysis
3.1) your method will depend on: i) the hypothesis you are testing, and; ii) your source(s) of data
4) Validate, validate, validate
We need to work together…across disciplines
Adapted from – Grimmer & Stewart (2013). “Text as Data: The Promise and Pitfalls of Automated Content Analysis Methods for Political Texts.” Political Analysis.
![Page 65: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/65.jpg)
Regardless of your Method…A key concern is always validation
PsychologicalProcesses
Examples of Dictionary Words
Words in Category
Internal Consistency(Uncorrected α )
InternalConsistency(Corrected α )
Psych. Affect happy, cried 1393 0.18 .57
Pos. Emotions love, nice, sweet 620 0.23 .64
Neg. Emotions hurt, ugly, nasty 744 0.17 .55
Anxiety worried, fearful 116 0.31 .73
Anger hate, kill, annoyed
230 0.16 .53
Sadness crying, grief, sad 136 0.28 .70
![Page 66: I want to know more about compuerized text analysis](https://reader035.vdocuments.site/reader035/viewer/2022081520/58cfdd9d1a28ab13238b5e2b/html5/thumbnails/66.jpg)
Thank you
Questions???