mining auditory hallucinations from unsolicited twitter posts
TRANSCRIPT
Mining auditory hallucinations from unsolicited Twitter
postsM. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic
University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research
Portorož, May 2016
Mining auditory hallucinations from
unsolicited Twitter posts
schizophrenia
hearing voices
mental
psychosissymptom
sound
health
M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic
University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research
Portorož, May 2016
Mining auditory hallucinations from
unsolicited Twitterposts
social network
brief message
fewer than 140 characters
310M active usersshare opinions
spontaneous unforced
unasked-for
M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic
University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research
Portorož, May 2016
Mining auditory hallucinations from unsolicited Twitter
posts
knowledge discovery
exploratory
patternsunseen
dataanalysis
M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic
University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research
Portorož, May 2016
Mining auditory hallucinations from unsolicited Twitter
posts
schizophrenia
hearing voices
mental
psychosissymptom
sound
health
knowledge discoverypatternsunseen
social network
brief message
fewer than 140 characters
320M active usersshare opinions
spontaneous unforced
unasked-for
M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic
University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research
Portorož, May 2016
Research aim
Q: Is it feasible to generate useful datasets from unsolicited Twitter posts regarding auditory hallucinatory experiences to support psychological investigations?
6
Research aim
Q: Is it feasible to generate useful datasets from unsolicited Twitter posts regarding auditory hallucinatory experiences to support psychological investigations?
6
A: Classification model that can predict whether a given post is related to hallucinatory experiences.
Potentially related posts7
I am hearing a scary voice right now, I don’t know if it’s in my head or in television.. Crazy
All twitter posts were paraphrased to preserve anonymity
✅
If hallucinating is thought of as hearing voices that are not actually real, then these painkillers are causing me to hallucinate like mad ✅
Unrelated posts8
My grandmom is watching Deliver Us From Evil and I can hear this weird high-pitched voice and want Ralph Sarchie to hold me
All twitter posts were paraphrased to preserve anonymity
❌
So I was convinced I was hearing stuff. It was so funny because the noise was coming from the kitchen but I thought I was hallucinating ❌
Iterative workflow9
Define search queries
Collect unique posts from Twitter
Annotate posts & Explore data
Predict relatedness of posts to hallucinatory experiences
Analyse data
Redefine search queries
Data collection10
Search query
hallucinating hearing
(“hear things” OR “hearing things”) “in my head”
hearing scary things “in my head”
(hear OR hearing) (“other people” OR “other ppls” OR “other ppl”) thoughts
(voice OR voices) (commenting OR criticising) (scary OR frightening OR “everything I do”)
(hear OR hearing) (voice OR voices) (god OR angel OR allah OR spirit OR soul OR “holy spirit” OR djinn OR jinn)
(hear OR hearing) (voice OR voices) (scary OR devil OR demon OR daemon OR evil OR “evil spirit”)
List of defined search queries for Twitter Search API
Data annotation11
• Two research psychologists manually annotated posts:
• Assign classes: related or unrelated to hallucinations
• Highlight specific phrases to describe their decisions
• Later highlighted words and phrases were utilised to identify characteristics of each classification category
Data annotation process
RESULT: 401 annotated examples: 94 related to hallucinatory experiences
• The observed IAA was 0.85 on 41 examples (10% of the final annotated set)
Data exploration: semantic classes12
• Relative (father, friend)
• Communication Tool (phone)
• Audio Device (headphones, TV)
• Drug (cannabis, painkillers)
• Audio Recording (voicemail)
• Possible Hallucination(seeing things, in my head)
• Audio & Visual Media, Apps (song, YouTube, Siri)
• Religious Term (prayer)
• Emotional Support (helpline)
• Own Voice Indicator (my voice, our own voice)
• Fear Expression (scared, creepy)
• Abusive Language (sh*t, hell)
• Stigmatising Language(crazy, insane)
Text classification pipeline13
Im hearing a scary voice rn,idk if it’s in my head or in TV..craazy
Information Extraction
ClassificationText
Preprocessing
corrected text
structured text
raw(unstructured)
text
structured text
label
label: related to hallucinatory experience
I am hearing a scary voice right now, I don’t know if
audio device
it’s in my head or in television.. Crazy
stigmatising lang.
fear expr.
possible hallucination
O V V D A N R R O V V P
AL P D N & P N
POS tagset from Gimpel et al. (2011): O - personal pronoun, V - verb, D - determiner, etc.
Information extraction14
My grandmom is watching Deliver Us From Evil and I can hear this weird high-pitched voice and want Ralph Sarchie to hold me
Neg. sentimentRelative [1] NE (person) [1] POS Tags
NE (misc) [1]
* Stanford NER using 4-class model trained on the CoNLL 2003 data
*
Information extraction14
My grandmom is watching Deliver Us From Evil and I can hear this weird high-pitched voice and want Ralph Sarchie to hold me
Neg. sentimentRelative [1] NE (person) [1]
key phrase extraction
POS Tags
hear this weird high-pitched voiceNeg. sentimentWeird / strange [1] POS Tags
V D A A N
NE (misc) [1]
* Stanford NER using 4-class model trained on the CoNLL 2003 data
*
Groups of features15
Feature group Features
Mentions of semantic classes mentions of each semantic class
Key phrases sentiment polarity, sem. classes, POS tags
Part-of-speech tags nouns, verbs, adjectives, etc.
Sentiment polarity positive, negative or neutral
Popularity of the post likes, retweets
Use of nonstandard language spelling mistakes, abbreviations
Number of Twitter entities URLs, #hashtags, @mentions
Named entities persons, locations, organisations
Lexical distribution sentences, words, characters
Classification scenario• 401 labelled examples: 94 related; 307 unrelated
• Three different types of classification methods:
• Naive Bayes (probabilistic model)
• Support Vector Machine (geometric model)
• AdaBoost (boosting of the tree model)
• Compare performance with simple baseline: tf-idf features
16
Evaluation17
Based on ten experiments of stratified 10-fold cross validation Baseline features outperform only with SVM, difference is non-significant (p-value=0.375)
Classification performance of various classification methods on two different sets of features
NB
SVM
AdaBoost
F2-score
0 0.225 0.45 0.675 0.9
0.711
0.751
0.486
0.772
0.743
0.831
Proposed featuresBaseline features
🏆
Contribution of features18
Features F2-score
Mentions of semantic classes *0.769▼
Key phrases *0.788▼
Part-of-speech tags 0.817▼
Sentiment polarity *0.818▼
Popularity of the post 0.828▼
Use of nonstandard language 0.831▬
Number of Twitter entities 0.832▲
Named entities 0.832▲
Lexical distribution 0.833▲
All features 0.831▲
* Statistically significant differences are marked with asterisk
Error analysis (highlights)19
Text Predicted Actual
I do not hear voices, I am not paranoid
✅Related
❌Unrelated
I’m hallucinating I’m hearing hawks! Oh hang on, it is just the television
✅Related
❌Unrelated
The voices which I hear every night tell me to do it
❌Unrelated
✅Related
All twitter posts were paraphrased to preserve anonymity
Generating dataset for analysis
1. Take best-performed classification model
2. Predict relatedness for unlabelled examples
3. Combine with 401 labelled (annotated) examples
RESULT: 4957 examples: 546 potentially related to hallucinatory experiences *
20
* e.g. Wiles et. al (2006) national survey only 62 cases identified
Preliminary data analysis21
Related
Unrelated
0 25 50 75 100
72%
19%
28%
81%• Negative sentiments significantly associated with posts that indicated the occurrence of auditory hallucinations
Preliminary data analysis21
Related
Unrelated
0 25 50 75 100
72%
19%
28%
81%• Negative sentiments significantly associated with posts that indicated the occurrence of auditory hallucinations
• Posts linked to auditory hallucinations had a higher proportionate distributionbetween the hours of 11pm and 5am
Summary• Experimental methodology to harvest and mine
datasets from unsolicited Twitter posts to identify potential psychotic(-like) experiences.
• Classification model that can relatively accurate predict the relatedness of posts to auditory hallucinations
• Preliminary data analysis that identified interesting patterns in sentiment polarity and posting time
• Future research: investigate expressions of sleep in Twitter users’ who report a diagnosis of a psychosis-related disorder
22