Download - Privacy and Security on Online Social Media: Workshop on Data Analytics & Its Security Issues
Privacy and Security in Online Social Media
Workshop on Data Analy0cs & Its Security Issues Jaypee Ins0tute of Informa0on Technology,
Sector 62 Dec 4, 2015
Ponnurangam Kumaraguru (“PK”)
Associate Professor J/ponnurangam.kumaraguru, @ponguru
Who am I? � Associate Professor, IIIT-‐Delhi � Ph.D. from School of Computer Science, Carnegie Mellon University (CMU)
� Research interests - Privacy, e-‐crime, online social media, and usable security
� Founding Head, CERC@IIITD, cerc.iiitd.ac.in � Co-‐ordinate and manage Precog, precog.iiitd.edu.in
� ACM India Eminent Speaker 4
Training Data
� 500 Tweets per event � Used CrowdFlower
10
Event Tweets Users Boston Marathon Blasts (2013) 7,888,374 3,677,531
Typhoon Haiyan / Yolanda (2013) 671,918 368,269
Cyclone Phailin (2013) 76,136 34,776
Washington Navy yard shoo0ngs (2013) 484,609 257,682
Polar vortex cold wave (2014) 143,959 116,141
Oklahoma Tornadoes (2013) 809,154 542,049
Total 10,074,150 4,996,448
Credibility Modeling
11
Feature set Features (45)
Tweet meta-‐data Number of seconds since the tweet; Source of tweet (mobile / web/ etc); Tweet contains geo-‐coordinates
Tweet content (simple)
Number of characters; Number of words; Number of URLs; Number of hashtags; Number of unique characters; Presence of stock symbol; Presence of happy smiley; Presence of sad smiley; Tweet contains `via'; Presence of colon symbol
Tweet content (linguis0c)
Presence of swear words; Presence of nega0ve emo0on words; Presence of posi0ve emo0on words; Presence of pronouns; Men0on of self words in tweet (I; my; mine)
Tweet author Number of followers; friends; 0me since the user if on Twiher; etc.
Tweet network Number of retweets; Number of men0ons; Tweet is a reply; Tweet is a retweet
Tweet links WOT score for the URL; Ra0o of likes / dislikes for a YouTube video
19
How many of you have posted mobile numbers on Online Social
Networks?
How many of you have seen mobile numbers being posted on
Online Social Networks?
Data statistics � Twiher: 12th October 2012 – 20th October 2013 � Facebook: 16th November 2012 – 20th April 2013
24
Numbers Category +91 Category 0 Category void Total
TwiOer Facebook TwiOer Facebook
TwiOer Facebook TwiOer Facebook
Mobile Numbers
885 2,191 14,909 8,873 25,566 25,294 41,360 36,358
User profiles
1,074 2,663 17,913 9,028 31,149 25,406 49,817 36,588
Data Extraction
� Data was collected from various open government data sources using PHP scripts and stored as MySQL databases.
27
OPEN GOVT. WEBSITES
Alphabets a-‐z for name, across 70 cons0tuencies
Name and DOB from DL
Random 5 seeds, ‘Incremental ahack’
PAN [53,419]
DRIVING LICENCE [2,24,982]
VOTER [81,95,053]
Data Extraction
� Public data from various online social networking sites was collected using public API calls.
� OAuth tokens were used for authen0ca0on and authoriza0on.
28
UNIQUE NAME
API CALLS
GOOGLEPLUS [28,900]
LINKEDIN [1,86,798]
FOURSQUARE [29,393]
TWITTER [15,57,715]
FACEBOOK [33,77,102]
Risk of Collation
30
Details User 1 User 2
Mobile Number
+9199xxxx2708 +9198xxxx5485
Full Name x Gambhir xxxxxx Jeswani
Age 23 53
Gender Male Male
Father’s Name
xx Gambhir x x Jeswani
Address ***, xxxx Bagh, Delhi
***, Mig Flats, *-‐block, xxxxx Vihar Phase-‐I
ID Voter ID: NLNxxx5696
Driving License: DL/04/xxx/222668
Shared by Owner?
No Yes
8 Delhi Users
Idenffied Uniquely
OCEAN: Open Government
Data Repository
Takeaways
� Online Social Media is a different beast in terms of privacy, iden0ty, and credibility - Research / technologies should be developed
� Mul0ple interes0ng research, engineering, and innova0on wai0ng to be done in India
31