motivation and challenge big data volume velocity variety veracity contributor content context value...

19
Alethiometer: a framework for assessing trustworthiness and content validity in social media Eva Jaho , Efstratios Tzoannos, Aris Papadopoulos, Nikos Sarris

Upload: juniper-hutchinson

Post on 14-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Alethiometer: a framework for assessing trustworthiness and content validity in social

media

Eva Jaho, Efstratios Tzoannos, Aris Papadopoulos, Nikos Sarris

MOTIVATION AND CHALLENGE

Big data

Volume Velocity Variety Veracity

Contributor Content Context

Value

5 Vs of Big Data

3 Cs of Veracity

ALETHIOMETER FRAMEWORK

C

C

C

ontributor

ontent

ontext

3

C1 CONTRIBUTOR

4

What can we find out about the source of information?

5

Contributor modalities• Reputation

- Analyse comments in the course of time, discover sentiments and opinions towards a source.- Measured by the number of upvotes or likes.

• History- Information about activity on different social media platforms, combined with validity data.- Measured by the update frequency of valid posts.

• Popularity- Information about following source activity (readings, recommendations).- Measured by the number of friends/followers, and the number of responses.

6

Contributor modalities

• Influence- Information about activities triggered by this source (re-posts, discussions or comments).- Measured by number of retweets/shares, Klout influence score.

• Presence- Information about type of source (individual, organisation,officially verified account, fake identity, etc.) and its presence on multiple social media platforms.- Measured by the number of accounts in different social media.

C2 CONTENT

7

Does the posted content look reliable?

8

• Reputation of linked web content- Measured in terms of domain reputation, page rank (GoogleRank or Alexa PageRank), or properties of the contributors to the content.

• Provenance- Finding the original occurrence of the content and its whole path across sources, places and time, and measuring the reputation of these sources.

• Popularity- Information about how many people are following this content.- Measured by the number of followers, and the number of responses.

Content modalities

9

• Influence- Analyse if this content is triggering discussions or other actions in the social sphere.- Measured by number of retweets/shares.

• Originality- Check whether the content or parts thereof have been used in the past (e.g., reused text or images that have appeared in the past).

• Authenticity- Check whether the content has been changed with respect to its original state (e.g., changed text or attached multimedia content)

• Objectivity and Diversity- Measured by the variation of opinions found for people, content, or general entities.

Content modalities

C3 CONTEXT

10

Does the 'what', 'when' and 'where’ stick together?

11

• Cross-checking- Measured by the number of different reports or mentions about the same thing coming from independent sources

• Coherence- Measurement of text coherence (e.g., Coh-Metrix) and coherence between the content and tags, attached web-links, or attached multimedia.

• Proximity- Measurement of coherence between reference location/time andpublication location/time.

Context modalities

12How to combine all these parameters?

Contributor

Content

Context

13

Approach for rating of modality parameters

Rate parameters on 5-point discrete scale, from 0 to 4- [0, a0) → 0, [a0, a1) →1, [a1, a2) → 2, [a2, a3) → 3, [a3, ∞) → 4.- a0: 20th percentile, a1: 40th percentile, a2: 60th percentile, a3: 80th percentile (adjust the scale so it follows a uniform distribution).

Weight the rating of parameters for deriving a total score uniformly or based on their significance

14

Are all these parameters necessary?

15

Parameters studied

• Number of followers

• Number of tweets

• User account age

Sample: ~10 M tweets, 5 K users

Collection period: July-September 2013

Preliminary statistical results

16

Empirical distributions

Heavy-tailed distributions

Multimodal heavy-tailed distributions with three different peaks(6.7 months, 23.3 months, 4.4 yrs)

17

Correlation coefficients

• Friends - followers: 0.1222• Friends - tweets: 0.08• Followers - tweets: 0.0197

Conclusion:- all parameters relatively independent from one-another- need to be studied independently

1818

• Summary • Defined Alethiometer: a framework taking into account all

aspects: Contributor, Content and Context

• Showed an approach for combining the ratings of all parameters

• Attested the relative independence of parameters and the need to consider a variety of measures (also previously emphasized in the literature)

• Future work• Investigate statistical properties of other modalities• Extract the significance of modalities • Study correlation between content, contributor and

context modalities

Summary and future work

find us at http://ilab.atc.gr follow us @iLabATC

Thank you

[email protected]

Questions & Answers