insights into socio politics using data analytics
Post on 19-Jun-2015
509 Views
Preview:
TRANSCRIPT
Insights Into Socio Politics Using Data Analytics
A presentation by
2
About Politweet
• Researching the socio-economic and political interest of Malaysians
• Developing analytical tools for Twitter research
• Creating interactive, data-driven sites about socio-economic and political topics
#bdw2013 #bigdataMY
3
Today’s Talk
• Overview of our data pipeline• Building timelines of historical events• Measuring user opinion• Measuring political partisanship• Visualising voter migration
#bdw2013 #bigdataMY
4#bdw2013 #bigdataMY
5
Technical Details
• Runs on PostgreSQL, MySQL and PHP running on Fedora Linux
• Events– 6.3 million tweets from 1.6 million users
• Politicians’ mentions– 5.5 million tweets from 385 thousand users
• Tweets related to American elections– 12 million tweets from 2 million users
#bdw2013 #bigdataMY
6
BUILDING TIMELINES
#bdw2013 #bigdataMY
7
Building Timelines
• Tweets as historical record• Bersih2 rally for electoral reforms – July 9th 2011– Goal: to reach Stadium Merdeka– 85372 tweets from 19190 users– 17452 mentions of locations collected for
investigative purposes
#bdw2013 #bigdataMY
8
Methodology
1. Identify most re-tweeted tweet for each hour2. Identify peak time periods for event3. Identify peak time periods for locations4. View tweeted images for each hour5. Watch videos that are supported by tweet
evidence6. Combine all this information to establish a
timeline, cross-reference by reading tweets in sequence to help separate rumour from fact
#bdw2013 #bigdataMY
9
#bersih2 Twitter Activity
#bdw2013 #bigdataMY
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240
2000
4000
6000
8000
10000
12000
14000
July 9 UsersJuly 9 Tweets
Hour
Twee
ts /
Use
rs
10
#bersih2 Area Activity
#bdw2013 #bigdataMY
11
#bersih2 Timeline
• 8 AM – People making journey to city; reports of roadblocks
• 9 AM – Arrests being made; police checking IC at KTM and LRT
• 10 AM – More arrests being made at KL Sentral, Masjid Jamek, Sogo; Large crowd reported at Masjid Negara; False report of tear gas fire at KLCC
#bdw2013 #bigdataMY
12
#bersih2 Timeline
• 11 AM – 236 people arrested so far; police targeting people in bersih tees;
#bdw2013 #bigdataMY
13
#bersih2 Timeline
• 12 PM – More arrests; Crowds gathered/moving at old railway station; Central Market; Petaling Street
#bdw2013 #bigdataMY
14#bdw2013 #bigdataMY
15
#bersih2 Timeline
• 1 PM – Tear-gas being fired near central market; Water cannon being used; Massive crowd gathered at Jalan Sultan, Puduraya; LRT stations closed
#bdw2013 #bigdataMY
16#bdw2013 #bigdataMY
17
#bersih2 Timeline
• 2 PM – Police action continues. The crowd at Puduraya has broken up, 1 section proceeds to Tung Shin hospital while the remainder heads to Stadium Merdeka and KLCC.
• The earlier crowd that remained at Jalan Sultan and Jalan Petaling were spared from similar police action.
• Bersih and Pakatan leaders were tear-gassed at KL Sentral, following an attempt to break through the police blockade
#bdw2013 #bigdataMY
18
#bersih2 Timeline
• 2.30 PM – Police action continues. Tear gas is fired into Tung Shin hospital grounds. Crowd at Stadium Merdeka remains calm.
• 3 PM – More arrests being made of crowd members at Tung Shin hospital. Crowd is scattered.
• 4 PM – Crowd begins to disperse in some areas. Large crowd reported at KLCC.
#bdw2013 #bigdataMY
19
#bersih2 Area Activity (revisited)
#bdw2013 #bigdataMY
20
Crowd Estimation
• Timeline establishes peak period• Photos determine extents• Google Maps used to measure area• Crowd density estimated as average persons
per sq. ft.• Final estimate was 45 – 50 thousand people
attended the rally
#bdw2013 #bigdataMY
21
Puduraya
Crowd Estimation Sample
Area covered: 127,536 sq.ft.Estimated crowd: 31,884 people
#bdw2013 #bigdataMY
22
Himpunan Kebangkitan Rakyat
• People’s Uprising Rally• January 12th 2013• Applied the same techniques to build a
timeline
#bdw2013 #bigdataMY
23
Crowd estimation
#bdw2013 #bigdataMY
24
Crowd estimation
#bdw2013 #bigdataMY
25
MEASURING USER OPINION
#bdw2013 #bigdataMY
26
Measuring User Opinion
• Sentiment analysis on tweets• Standard approaches– Classify sentiment based on words or phrases– Use Support Vector Machine (SVM) technique to
build topic-specific classifiers• Demonstration: Tweets on #MansuhPTPTN
(Abolish PTPTN)
#bdw2013 #bigdataMY
27
Word-based Classifier
#bdw2013 #bigdataMY
neutral
positive
neutral
neutral
neutral
positive
neutral
neutral
neutral
neutral
neutral
negative
neutral
neutral
Identify keywords to determine sentiment
Result:2 positive11 neutral1 negative
28
Word-based Classifier
#bdw2013 #bigdataMY
neutral
negative
positive
negative
neutral
neutral
neutral
positive
neutral
neutral
neutral
neutral
neutral
negative
negative
negativeneutral
neutral
negative
Lets add ‘ditahan’ and ‘blacklist’ to list of negative words
Result:2 positive5 neutral6 negative
29
Word-based Classifier
• Word and phrase-based classifiers are good at measuring ‘mood’ of a tweet
• Often result in large % of neutral sentiment• Now we try Support Vector Machine (SVM)
#bdw2013 #bigdataMY
30
SVM Approach
#bdw2013 #bigdataMY
neutral
positive
neutral
positive
neutral
neutral
neutral
neutral
negative
neutral
neutral
negative
Certain phrases are used by supporters of the proposal
Keywords influence results positive
positive
positive
Result:4 positive9 neutral1 negative
31
SVM Approach
• SVM improves results but requires training sets of data
• Not practical for infrequent topics, such as the PTPTN issue
• For regular issues, constant training required to keep up to date
• Does not reliably tell us the final opinion of the user
#bdw2013 #bigdataMY
32
Deducing Final Opinion
#bdw2013 #bigdataMY
neutral
positive
neutral
positive
neutral
neutral
neutral
neutral
negative
neutral
neutral
negative
If the last tweet was positive, does that imply positive opinion?
positive
positive
positive
33
Our Methodology
1. Collect all tweets from users on a given topic for a fixed length of time
2. A human examines tweets in sequence, on a per-user basis
3. Based on the examination, determine the final opinion of the user
4. Common reasons for support / opposing an issue are noted
#bdw2013 #bigdataMY
34
Testing Our Method
#bdw2013 #bigdataMY
positive
positive
neutral
neutral
neutral
neutral
negative
Researcher determines this user supports the proposal to abolish PTPTN
The opposition to the methods of student activists is noted.
This user is not opposed to a reduction in interest rate, instead of abolishing outright
positive
positive
positive
positive
positive
positive
positive
positive
35#bdw2013 #bigdataMY
36
Opinion-based Sentiment Analysis
• Pro – More accurate measurement of sentiment than
standard approaches– Offers details on why users oppose or support an
issue – Not influenced by large volume of tweets
• Con– Time-consuming to prepare– Requires researchers familiar with the language and
the issue#bdw2013 #bigdataMY
37
Geo-located Sentiment Analysis
• Same methodology, but only on geo-located tweets
• Results in sentiment based on location, and how many in the area tweeted about the topic
• Demonstration: Himpunan Kebangkitan Rakyat (People’s Uprising Rally) on January 12th
#bdw2013 #bigdataMY
38#bdw2013 #bigdataMY
39#bdw2013 #bigdataMY
40
Plans for the Future
• Build a Malay-language SVM to determine sentiment on tweets
• Use sampling to estimate the opinion of the Twitter user population
#bdw2013 #bigdataMY
41
POLITICAL PARTISANSHIP
#bdw2013 #bigdataMY
42
Measuring Political Partisanship
• Who we follow• Who we mention
#bdw2013 #bigdataMY
43#bdw2013 #bigdataMY
44
Who we follow
#bdw2013 #bigdataMY
45
Who we mention
#bdw2013 #bigdataMY
46
#bdw2013 #bigdataMY
47
Voter migration
#bdw2013 #bigdataMY
48
Contact details
• Facebook : Fb.com/politweet• Twitter : @politweetorg• Email : admin@politweet.org
#bdw2013 #bigdataMY
top related