credibility, identity resolution, privacy, and policing in online social media

30
Credibility, Identity Resolution, Privacy, and Policing on Online Social Media IIT Guwahati Sept 26, 2016 Ponnurangam Kumaraguru (“PK”) Associate Professor ACM Distinguished Speaker fb/ponnurangam.kumaraguru, @ponguru

Upload: precog

Post on 18-Feb-2017

166 views

Category:

Education


0 download

TRANSCRIPT

Credibility,   Identity  Resolution,  Privacy,  and  Policing  on  Online  Social  Media  

IIT  GuwahatiSept  26,  2016

Ponnurangam  Kumaraguru  (“PK”)Associate  Professor

ACM  Distinguished  Speakerfb/ponnurangam.kumaraguru,  @ponguru

Who  am  I?  

� Associate  Professor,  IIIT-­‐Delhi    � Ph.D.  from  School  of  Computer  Science,    

Carnegie  Mellon  University  (CMU)    � Research  interests  -Social  Computing,  Computational  Social  Science,  Complex  Networks  pertaining  to  Human  Behavior,  specifically  in  the  context  of  Security  &  Privacy

� Co-­‐ordinate  and  manage  Precog,  precog.iiitd.edu.in

� ACM  Distinguished  Speaker  

2

https://www.youtube.com/channel/UCHWDvGDh4QjWbV79bM2neSg

3

4

What  we  dabble  with!  

Non-­‐trustworthy  Content

FAKE

5

$

RUMORS

Methodology

6

Training  Data

� 500  Tweets  per  event� Used  CrowdFlower

7

Event Tweets UsersBoston  Marathon  Blasts  (2013) 7,888,374 3,677,531

Typhoon Haiyan /  Yolanda  (2013) 671,918 368,269

Cyclone  Phailin (2013) 76,136 34,776

Washington  Navy yard shootings (2013) 484,609 257,682

Polar  vortex cold wave (2014) 143,959 116,141

Oklahoma  Tornadoes (2013) 809,154 542,049

Total     10,074,150 4,996,448

Credibility  Modeling  

8

Feature  set   Features (45)  

Tweet  meta-­‐data  Number  of  seconds  since  the  tweet;  Source  of  tweet  (mobile  /  web/  etc);  Tweet  contains  geo-­‐coordinates

Tweet  content  (simple)  

Number  of  characters;  Number  of  words;  Number  of  URLs;  Number  of  hashtags;  Number  of  unique  characters;  Presence  of  stock  symbol;  Presence  of  happy  smiley;  Presence  of  sad  smiley;  Tweet  contains  `via';  Presence  of  colon  symbol

Tweet  content  (linguistic)  

Presence  of  swear  words;  Presence  of  negative  emotion  words;  Presence  of  positive  emotion  words;  Presence  of  pronouns;  Mention  of  self  words  in  tweet  (I;  my;  mine)

Tweet  author   Number  of  followers;  friends;  time  since  the  user  if  on  Twitter;  etc.

Tweet  network  Number  of  retweets;  Number  of  mentions;  Tweet  is  a  reply;  Tweet  is  a  retweet

Tweet links   WOT  score  for  the  URL;  Ratio  of  likes  /  dislikes  for  a  YouTube  video

Implementation

Feedback  by  Users

10

v

Harvard  (1839)  – Harvard  – Harvard  – Harvard  – MIT  –Northwestern  – UIUC  – WUSL  – CMU  (2009)  – IIITD  (2015)        

12

http://twitdigest.iiitd.edu.in/TweetCred/

13

De-­‐duplicating  audience

Social  audience    =  437,632  +  153,000  +  805,097  or  less??

14

Challenges

15

ProfessionalOpinion

Dating

Heterogeneous  OSNs

Personal

Degree  of  Details

Quality  and  descriptive  personal  And  professional  information

Little  personal  information  Descriptive  opinions

Attribute  Evolution

Time

Information  evolved  on  one  but  not  on  other

{jainpari,  Bangalore}

Registration  with  same  information  on  both  OSNs{paridhij,  New  Delhi}

Generic  Identity  Resolution

16

Extract  available  &  

discriminativefeatures

Candidate  Identities

IDENTITY  SEARCH IDENTITY  LINKING

Pairwise  Comparisons

Heuristic  Identity  Search

17cerc.iiitd.ac.in

Profile

Content

Self-mention

Network Syntactic and Image

Search Linking

If self-identified / returned by

more than one search method

No

Yes

Candidate Identities

name, location,usernamemobile no,

post,friends,

followers

Paridhi  Jain,  Ponnurangam Kumaraguru,  and  Anupam Joshi.  2013.  @I  seek  ‘fb.me’:  Identifying  Users  across  Multiple  Online  Social  Networks.  In  Proceedings  of  the  22nd  International  Conference  on  World  Wide  Web,  WWW  ’13  Companion.  ACM,  New  York,  NY,  USA,  1259-­‐ 1268.  DOI=http://dx.doi.org/10.1145/2487788.2488160    [Honorable  Mention  Award}  

Harvard  (1839)  – Harvard  – Harvard  – Harvard  – MIT  –Northwestern  – UIUC  – WUSL  – CMU  (2009)  – IIITD  (2016)        

18

19

20

How  many  of  you  have  posted  mobile  numbers  on  Online  Social  

Networks?

How  many  of  you  have  seen  mobile  numbers  being  posted  on  

Online  Social  Networks?

Sample  posts

21

Sample  posts

22

Sample  posts

23

Sample  posts

24

Data  statistics

� Twitter:  12th  October  2012  – 20th  October  2013� Facebook:  16th  November  2012  – 20th  April  2013

25

Numbers Category  +91 Category  0 Category  void Total

Twitter Facebook Twitter Facebook Twitter Facebook Twitter Facebook

Mobile  Numbers

885 2,191 14,909 8,873 25,566 25,294 41,360 36,358

User  profiles

1,074 2,663 17,913 9,028 31,149 25,406 49,817 36,588

26

SocialCaller  App

27

https://play.google.com/store/apps/details?id=com.ayush.socialcaller&hl=en

28

http://precog.iiitd.edu.in/research/ocean/

Takeaways

�Online  Social  Media  is  a  different  beast  in  terms  of  privacy,  identity,  and  credibility-Research  /  technologies  should  be  developed

�Multiple  interesting  research,  engineering,  and  innovation  waiting  to  be  done

� Interested  in  hosting  students  – B.Tech.,  M.Tech.,  Ph.D.

29

30

https://www.facebook.com/PreCog.IIITD/