real-world behavior analysis through a social media lens
DESCRIPTION
In this paper, using a large amount of data collected from Twitter, the blogosphere, social networks, and news sources, we perform preliminary research to investigate if human behavior in the real world can be understood by analyzing social media data. The goals of this research is twofold: (1) determining the relative effectiveness of a social media lens in analyzing and predicting real-world collective behavior, and (2) exploring the domains and situations under which social media can be a predictor for real-world's behavior. We develop a four-step model: community selection, data collection, online behavior analysis, and behavior prediction. The results of this study show that in most cases social media is a good tool for estimating attitudes and further research is needed for predicting social behavior.TRANSCRIPT
Data Mining and Machine Learning Lab
Real-World Behavior Analysisthrough a Social Media Lens
Mohammad-Ali Abbasi, Huan LiuComputer Science and Engineering, Arizona State University
Sun-Ki Chai, Kiran SagooDepartment of Sociology, University of Hawai`i
Data Mining and Machine Learning Lab
Real-World Behavior Analysis through a Social Media Lens
2
Real world Events/Behavior
Data Mining and Machine Learning Lab
3
Real-World Behavior Analysis through a Social Media Lens
Data Mining and Machine Learning Lab
4
Real-World Behavior Analysis through a Social Media Lens
Data Mining and Machine Learning Lab
5
Real-World Behavior Analysis through a Social Media Lens
Data Mining and Machine Learning Lab
6
Any correlation between social media numbers and election results?
1,520,000
370,000
295,000
1,447,000
173,000
160,000
900,000
260,000
Ron Paul Newt GingrichMitt Romney Rick Santorum
25,500,000
12,920,000
Barack Obama
Number of States carried?
http://en.wikipedia.org/wiki/Republican_Party_presidential_primaries,_2012
Do we observe the same difference in the votes?
Data Mining and Machine Learning Lab
7
Objectives of the research
• Studying the correlation between real-world collective
behavior and social media data
• Determining the relative effectiveness of a social media
lens in analyzing and predicting real-world collective
behavior
• Exploring the domains and situations under which
social media can be a predictor for real-world's behavior
Data Mining and Machine Learning Lab
8
Data collection
Active methods
• Experiments
• Surveys
• Field Study
Passive methods
(By observing and analyzing)
• Behavior
• Belongings
• Documents, …
• Expensive
• Time consuming
• Maybe dangerousSocial Media
• People leave many clues about themselves
• Their interactions reveal much about people
• We can passively observe people’s activities
Data Mining and Machine Learning Lab
9
Snooping
Experimental psychology suggests that a person
may be understood by what happens around him
• Does what's on your desk reveal what's on
your mind?
• Do those pictures on your walls tell true tales
about your character?
Data Mining and Machine Learning Lab
10
Using online data for opinion polling
• From Tweets to Polls: Linking Text Sentiment
to Public Opinion Time Series
• O'Connor et al. analyzed sentiment polarity
of tweets and found a correlation of 80% with
results from public opinion polls
Data Mining and Machine Learning Lab
11
Some Existing Work
• Stock Market Prediction using data collected data
form twitter
• Box-office revenues prediction for movies
• Analyzing Arab-Spring using social media
Most of the work in the field can be classified into two categories:
• Behavior Analysis and finding a correlation
• Behavior prediction
Data Mining and Machine Learning Lab
12
Our approach: A four-step model
Find equivalent groups in Real-World & Social Media
Collect Related Online Data from Social Media
Analyze Online Data (Behavior)
Analyze the Real-World Behavior & find correlation
Data Mining and Machine Learning Lab
13
Experimental settings
Find a Group in real world and Social Media
Collect Related Online Data from Social Media
Analyze Online Data (Behavior)
Analyze the Real-World Behavior
• Twitter to collect 35 million tweets related to Arab Spring
• Collect more than 1 million blogposts
• 135,000 popular Facebook pages to collect data on posts, comments and like behavior on Facebook.
• The data on real-world events has been collected from Reuters.com
• Select based on more stable characteristics
Race, religion, primary language, and country/region of origin
• Arab-Spring movement
• Correlational analysis
• Multivariate regression analysis
• Information Retrieval techniques
• Sentiment polarity analysis
• Statistical methods
Data Mining and Machine Learning Lab
14
Correlation between online and real events
Time that event in real-world happened
Data Mining and Machine Learning Lab
15
Observations
Time that event in real-world happened
Data Mining and Machine Learning Lab
16
Observations
• There could be correlations between real-world events and
online discussions. However,– Correlation is not amount to prediction
– Poor results for small events• Many real-world events left uncovered
– Influence and cascade effects, causes too much non-relevant
discussion in social media
• What we have experimented– Finding Influential people– Analyzing Mood over the network
Data Mining and Machine Learning Lab
17
What are people concerned about
Data Mining and Machine Learning Lab
18
Challenges
• Finding Relevant Communities– Analyzing Arab Spring tweets, show that 75 percent
of the 1 million clicks on Libya-related tweets and 89 percent of the 3 million clicks for Egypt-related Tweets came from outside of the Arab world1
– The fallacy of millions of followers
1- http://www.stripes.com/blogs/stripes-central/stripes-central-1.8040/researchers-skeptical-dod-can-use-social-media-to-predict-future-conflict-1.15529
Data Mining and Machine Learning Lab
19
Challenges
• Data Collection– Sufficient coverage of the data– Source of data is unknown– Spam– Paid social media content
• Online behavior Analysis– Unstructured, noisy text data– Language ambiguity
Data Mining and Machine Learning Lab
20
Observations
Real-World Behavior Prediction– Stark difference between click and taking
real risk in the street
Data Mining and Machine Learning Lab
21
Conclusions
• Social media is helping us to understand the real-
world’s events but is not a sole source
• More research and development to make social
media a reliable source for behavior analysis
• Social event prediction using social media remains
an open problem. More interdisciplinary research
should be promoted.
Data Mining and Machine Learning Lab
22
Thanks!
Mohammad-Ali Abbasi
Acknowledgments: This work is, in part, sponsored by ONR and AFOSR grants.
We are grateful for the comments from anonymous reviewers and members of DMML lab at ASU