master project reportan analysis of credibility in...

University of California Santa Barbara

Department of Computer Science

Master Degree in Computer Science

Master Project Report

An Analysis of Credibility in Microblogs

Author

Byungkyu Kang

Perm. 4985065

Committee in charge

Tobias Hollerer, Chair

Matthew Turk, Xifeng Yan and John O’Donovan

June 7th 2012

Abstract

User-generated microblog content is being provided at an increasingly rapid rate, mak-

ing it increasingly difficult to distinguish credible and newsworthy information from a

mass of “background noise”. In contrast to the information flow paradigm of traditional

media, which has a few-to-many relation between information producers and consumers,

microblogs support a many-to-many flow paradigm, which raises the challenge of as-

sessing the credibility of information providers based on a relatively small window of

information.

To address this challenge, we present an assessment of information flow, credibility

and provenance in Twitter, through a set of two different experiments. We first provide

credibility assessment models (social, content and hybrid models) derived from sampled

Twitter datasets and evaluate them in terms of predictive accuracy on a set of crowd-

sourced credibility ratings on crawled tweets. Next, we present an in-depth analysis on

the utility and distribution of predictive features across a diverse set of Twitter contexts.

Results of the first experiment show that our model based on social features (e.g:

ratio of friends to followers) outperforms content-based and hybrid models with 88%

accuracy in predicting tweets assessed as credible, using a J48 decision tree algorithm.

Our second experiment reveals higher feature usage in microblogs commenting on unrest

or emergency situations and different distributions upon various other topics.

Table of Contents

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Downside of User-generated Content . . . . . . . . . . . . . . . . . . . . 1

1.3 Project Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Organization of This Report . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Related Work 5

2.1 Credibility of Information Sources . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Credibility and Trust on the Web . . . . . . . . . . . . . . . . . . 6

2.1.2 Information Credibility on Twitter . . . . . . . . . . . . . . . . . 7

3 Twitter 11

3.1 Twitter Jargons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1.1 Common Twitter Coins . . . . . . . . . . . . . . . . . . . . . . . 12

3.1.2 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Follower and Followee . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3 Usage of Twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4 Noise Data in Twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Experiment I: Credibility 17

4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2 Credibility Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2.1 Definition of Credibility . . . . . . . . . . . . . . . . . . . . . . . 18

4.2.2 Modeling Credibility . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.3 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

I

II TABLE OF CONTENTS

4.3.1 Data Mining and Preliminary Analysis . . . . . . . . . . . . . . . 23

4.4 User Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.5.1 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.5.2 Predicting Credibility . . . . . . . . . . . . . . . . . . . . . . . . 31

4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5 Experiment II: Feature Analytics 37

5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.2 Data Gathering and Processing . . . . . . . . . . . . . . . . . . . . . . . 38

5.2.1 Online Evaluation of Credibility . . . . . . . . . . . . . . . . . . 39

5.2.2 Retweet Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.2.3 Dyadic Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.3 Feature Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.4 Feature Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.4.1 Credibility Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.4.2 Analysis across Multiple Corpora . . . . . . . . . . . . . . . . . . 45

5.4.3 Feature Distribution in Retweet Chains . . . . . . . . . . . . . . 48

5.4.4 Feature Distribution in Dyadic Pairs . . . . . . . . . . . . . . . . 49

5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6 Conclusion 51

6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Bibliography iii

List of Figures

3.1 Distribution of content in Twitter (excerpt from [1] and mashable.com

infographic, 2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Depiction of conceptual graphs of following action and information flow

in Twitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1 Illustration of the scope of crawled data for each of 7 current topics. . . 23

4.2 Word cloud showing origin of tweets in the Libya data set. . . . . . . . . 25

4.3 Word cloud showing distribution of popular terms in the Libya data set.

(This visualization is generated using Wordle, http://www.wordle.net/) 26

4.4 Three graphs showing (a) gender distribution, (b) familiarity with mi-

croblogging service and (c) age distribution of the participant of the on-

line user survey in this experiment. (from left to right, respectively) . . 27

4.5 Plot showing perceived credibility in each of four Twitter contexts (Neg-

ative, Positive, True, Null). . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.6 Friend to Follower patterns across four of our topic-specific data sets. All

other sets exhibited a similar distribution. . . . . . . . . . . . . . . . . . 30

4.7 Histogram of average credibility rating from the online survey across num-

ber of followers for the tweeters (binned). . . . . . . . . . . . . . . . . . 31

4.8 Plot showing a sample of feature distributions for the content based model

on a 3000 labeled tweets from the Libya dataset. . . . . . . . . . . . . . 32

4.9 Comparisons of each feature used in computing the social credibility

model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.10 Comparisons of link and no-link contexts across four different groups in

the online user survey. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

III

IV LIST OF FIGURES

4.11 Comparison of predictive accuracy over the manually collected credibility

ratings for each model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.12 Statistical results from a J48 tree learner for the best performing credi-

bility prediction strategy (Feature Hybrid) . . . . . . . . . . . . . . . . 35

5.1 Data crawling procedure used for gathering 8 datasets from the twitter

social graph, consisting of over 20 million users and 200 million tweets in

total. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.2 Analysis of the distribution of selected content-based features based on

two credibility contexts. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.3 Analysis of the distribution of selected content-based features across a

diverse set of Twitter topics. . . . . . . . . . . . . . . . . . . . . . . . . 46

5.4 Two different charts showing the average occurrence score of (a) each

feature across all dataset and (b) all features per topic. . . . . . . . . . . 47

5.5 Distribution of predictive features in three contexts formed around length

of retweet chains. a) Non-retweeted content b) Short chains (1-3 hops)

c) Long chains (≥4 hops). . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.6 Distribution of predictive features dyadic pairs. Tweets were selected for

this group if they occurred in a pairwise conversation between two users

in which more than two messages were exchanged, as measured by the

mention and retweet metadata in Twitter API. . . . . . . . . . . . . . . 49

List of Tables

3.1 Common technical abbreviations on Twitter . . . . . . . . . . . . . . . . 13

4.1 Three predictive models for identifying credible information from Kang

et al. [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2 Overview of 7 topic-specific data collections mined from both the Twitter

REST and streaming APIs. . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.1 Context classes in our Evaluation. . . . . . . . . . . . . . . . . . . . . . 38

5.2 Overview of 7 topic-specific data collections mined from the Twitter

streaming API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.3 Examples of 8 topic-specific tweets from the crawled sets. . . . . . . . . 40

5.4 The set of Twitter features analyzed in our evaluation. Features are

grouped into three classes: a) Social, b) Content-based and c) Behavioral.

Each is a representative subset for the larger feature sets in each class. . 42

V

VI LIST OF TABLES

Chapter 1

Introduction

1.1 Background

The recent success of social networks, such as Twitter 1, LinkedIn2 and Facebook3,

accelerated massive adoption of user-generated content (UGC) and consequently, end

users on the web became a major source of information by providing news, media

production, blogging and forum comments, for example. Information flow has shifted

from a few dedicated professionals to a set of distributed general end-users on the

web. As the number of unverified information sources increases, growing uncertainty of

information quality(or credibility) of sources means that we need better ways to filter

out non-credible data for consumers.

1.2 Downside of User-generated Content

The sudden increase of user-generated information brings a new challenge for todays

web application designers: How to detect trustworthy and accurate information. For

example, one can easily expect to have more than half of their web search results con-

tain user-generated content as a result of an arbitrary search. Pieces of contents from

Wikipedia, Blogger, Blogspot, Facebook, Twitter and floods of other services are good

examples of this. Therefore, the untold amount of user generated content [3]4 yields

1http://www.twitter.com2http://www.linkedin.com3http://www.facebook.com4The volume of activity on Twitter has increased at an extraordinary rate, to an average of 140

million tweets per day as of March 2011 according to the TwitterBlog [3].

1

2 CHAPTER 1. INTRODUCTION

the need to efficiently analyze and immediately understand big chunks of social data,

and find trustworthy information sources amongst them. Without a smart strategy for

filtering out unnecessary data scattered on the web, there would be too much noisy data

for end users to manually filter. Moreover, information seekers become more vulnerable

to spam or malicious information generated by anonymous users.

Thus, this project tackles two different problem statements below:

Problem Statement 1 How well can we assess the credibility of standalone content

in microblogs?

Problem Statement 2 Can we leverage social graph to improve credibility assessment

of microblog content?

In a macroscopic perspective, we study “What type of features and structures in

microblogs we can utilize in order to evaluate credibility of information sources.” This

question is applied throughout the studies covered in this project. However, we also scru-

tinize two different scientific approaches: (1) social and content based credibility mod-

eling on topic-specific settings, (2) in-depth analysis on feature clusters in microblogs.

1.3 Project Overview

This project provides the reports of the two different experiments focusing on informa-

tion credibility. Each consists of an individual experimental procedure and evaluation

of an approach discussed above.

1: Modeling Topic-specific Credibility in Twitter In this study, we assumed

that both social structural metadata and raw content can be used as important features

for finding quantitative credibility scores. Our initial expectation was that the hybrid

model which combines social and content features would outperform the two individual

models in predicting credibility scores. Interestingly, however, our experiment with

real-world Twitter data set shows that the social model performs best, followed by the

hybrid and the content models respectively. By carefully observing candidate features

obtained from the data crawler, 5 social features and 19 content features were chosen

for the exploratory data analysis. As a result, the social model predicted credibility

in tweet level with 88.17% accuracy, and this is achieved using J48(C4.5) decision tree

classification algorithm.

1.4. ORGANIZATION OF THIS REPORT 3

2: Context Based Analysis of Feature Utility As the second phase of our cred-

ibility exploration, we focused on the detailed analysis of feature distribution in mi-

croblogs. We scrutinized more than a hundred features categorized into three different

contexts: credibility, chain length, and dyadic interaction. We also conducted a compar-

ison of feature distribution across 8 different topic-specific tweet data sets. The result

of this analysis shows that the frequency of feature occurrences in general tends to

increase in social emergency situations such as natural disasters(#earthquake) or civil

revolutions(#libya or #egypt). Momentary social events(#superbowl) or national elec-

tion(#romney) also demonstrate increased usage of external information sources such

as urls or mentions of news archives.

1.4 Organization of This Report

This report is organized into six parts; in Chapter 1 and 2, we introduce work and

outline relevant literature, discussing a number of different approaches in the field of

information credibility, respectively. Next, we explain the general characteristics of

Twitter in Chapter 3, as one of the leading social network microblogging services. To

be specific, we discuss Twitter in two different perspectives: its directed-graph structure

including newly introduced features such as lists and its rapidly growing popularity in

public with some real-world practices via third-party applications. Next, we present

two individual experiments in Chapter 4 and 5 including their experiment setups. All

two experiments rely on the Python based data crawler we designed. We also describe

the refined data set for each project. We particularly detail the experimental setup of

the first project followed by a report on several self-evaluation tests as well as the user

survey that we conducted, since the original data set from the first project is used in

the next project. Finally, in Chapter 6, we conclude with an overall discussion of our

work, its limitations, and avenues for future work.

4 CHAPTER 1. INTRODUCTION

Chapter 2

Related Work

Recently, microblogging services have become one of the most rapidly emerging web

technologies and their popularity is clear from statistics of usage along with the abun-

dance of different applications. For example, the vast majority of major news companies,

such as the New York Times and the Guardian, for example equipped every page of

their websites with social plugin buttons linked to Facebook, Twitter and Google Plus.

Since micro-blogging services are built on top of social network structure, any content

produced by any user can be easily shared and disseminated through socially connected

edges, potentially coming from remote strangers. Furthermore, because of its short

message lengths (e.g. 140 characters for Twitter), they are mobile device friendly and

have many of third-party applications that provide various types of services.

The main area of research related to our work is the assessment of information

credibility in microblogs [2, 4–6]. In this section, we give an overview of existing credi-

bility assessment systems. According to the literature, credibility can be measured by

a number of mathematical models based on the content of information or social status

of users [2, 4].

2.1 Credibility of Information Sources

Credibility of an information source is important when it comes to sharing the infor-

mation with other people. The ratio of usage of microblogging in business, research

or politics continues to increase rapidly and therefore, the importance of accuracy and

5

6 CHAPTER 2. RELATED WORK

trustworthiness of such information cannot be underestimated. Since Twitter was pub-

licly released in August 2006 as a free social networking service, a large number of

research has focused on evaluating the credibility of content or its producers. In this

discussion of related research, we focus on a few of the most relevant works. Our analysis

falls into three broad categories: research in the general area of trust and credibility on

the web, research in the micro-blog domain, and finally an overview of relevant works

from the field of recommender systems, particularly those works that have guided the

models and methods presented in this project report.

2.1.1 Credibility and Trust on the Web

A good overview of research on trust on the social web is presented in Artz et al. [7]

which encompasses more than a hundred related articles. They adopt two different cat-

egories from Olmedilla et al. [8] for measuring trust of information: policy-based trust

and reputation-based trust. In addition to those two categories, general trust models

and trust in information resources were discussed. A number of recent researches on in-

formation credibility adopt reputation based method into social network and microblog

settings such as [2, 10, 33, 81]. Here, policy is expressed as a trust or credibility model

comprised of existing social meta data, such as number of followers and followees or

age of an account in Twitter, underlying social network structure. Including the litera-

ture mentioned above [2,10,33,81], our approach for measuring credibility falls into the

category of trust in information resources.

Another interesting study by Fogg et al. [9] on ‘which elements on the web affect

people’s perception of credibility’ is worth mentioning here. They conducted a large

scale user survey over 1,400 participants from the U.S. and Europe, evaluating 51 dif-

ferent web site elements. They verified five types of elements which increase credibility

perceptions of people–‘real-world feel’, ‘ease of use’, ‘expertise’, ‘trustworthiness’, and

‘tailoring’–and two types of elements which damage credibility–‘commercial implica-

tions’ and ‘amateurism.’ A few of these elements seem to be applicable to Twitter-like

microblogging settings such as ‘expertise’, ‘trustworthiness’, ‘commercial implications’

and ‘amateurism’ that have nothing to do with the layout or design of content. This is

because micro-blogging content is mostly viewed as a simple list of short-length messages

without any intricate or detailed design.

2.1. CREDIBILITY OF INFORMATION SOURCES 7

Policy-based Trust In a policy-based trust system, a particular set of conditions

are necessary to obtain trust. When these conditions are met, one can have expected

outcomes from the system. In this case, the policies frequently involve the exchange

or verification of credentials, which are information issued by one entity. Here, the

credential generally contain a description of qualities, features or credibility of another

entity. For example, the credential of a certification means its holder has been verified

by the issuing institution as having a qualification of a certain ability or education.

This process involves the holder with the issuer of that credential. This policy-based

system can be used to verify the credibility or quality of an individual entity if another

entity has no interaction with or reference to the entity (credential holder). Since the

credential issuer (or a third party) is trusted in a network or domain, it can perform

the inquired verification process as an authority on behalf of the credential holder. [7].

Reputation-based Trust Reputation can be assessed in a network or an social com-

munity based on the history of interactions with or observations of an entity, either

directly with the evaluator (personal experience) or as reported by others (recommen-

dations or third party verification). In the latter case, it is similar to the policy-based

trust due to the involvement of the third party verification process. However, the type of

history and the way of interaction or observation vary depending on a variety of factors.

In a social network system, for example other entities can directly evaluate a specific

entity through quantitative rating system. Moreover, this ratings can be time-variant

or only accumulative regardless of time passed. The recursive problems of trust also

can occur when referencing of information from others. Although, both credentials and

reputation involve the transfer of trust from one to another, each technique has its dis-

tinct problems that have stimulated the existing research in trust. There are a number

of literature on reputation-based trust or credibility analysis such as [10–12](Empirical

Analysis of the eBay’s Reputation System) [7](Reputation-based analysis on semantic

web).

2.1.2 Information Credibility on Twitter

Research on information credibility in Twitter has been popular, since most of the

information, such as tweets and user profiles, in Twitter are public and the usage of

its service is explosive [citation for statistics]. For instance, from Kochen & Poole’s


experiments [13] and Milgram’s famous small worlds experiment [14], trust has been

shown to play an important role in social dynamics of a network. With social web

API’s, researchers now have many orders of magnitude more data at their fingertips, and

they can experiment and evaluate new concepts far more easily. This is evident across

a variety of fields, for example, social web search [10], semantic web [7, 15, 16], online

auctions [11,12,17], personality and behavior prediction [18,19], political predictions [20]

and many others.

Recently, a large number of research efforts [21–23] focus on Twitter, particularly

on the perception of credibility in microblogs [24], its retweeting mechanism [25, 26] or

distinguishing elite users, who are celebrities, bloggers and representatives of media out-

lets, and ordinary users by exploiting a recently introduced ‘list ’ feature of Twitter [27].

Another research on Twitter at a community level also has been conducted [28–30]. In

addition to this, Caverlee et al. [31] analyzed credibility-based link to identify web spam-

ming links in Twitter. Moreover, Canini et al. [4] used ‘Baysian-based’ approach when

they evaluate features of tweet in contrast to Sridar et al. [32] focusing on non-Bayesian

models for learning beliefs derived from social influence weights.

Measurement of credibility from contents Several research conducted credibility

assessments from various information sources. For example, Wei et al. [33] evaluated

information credibility on an e-learning social network and Suzuki et al. [34] conducted

an experiment to assess credibility score of contents of social network services such as

Facebook1 and LinkedIn2 using Wikipedia for messages. Agichtein et al. [35] also pro-

posed their general classification framework in order to find high-quality user-generated

content (UGC) in social-media focusing on ‘Yahoo! Answers3’. Juffinger et al. [36] de-

veloped an algorithm to evaluate blog-depth (blog-unit) credibility score based on com-

parison to a set of blogs by analyzing blog entries and author profile information of each

blog. For assessing information credibility in microblogs, specifically Twitter, [2, 4–6]

proposed credibility models by exploiting a wealth of social and content features using

Twitter API.

1http://www.facebook.com/2http://www.linkedin.com/3http://answers.yahoo.com/

2.1. CREDIBILITY OF INFORMATION SOURCES 9

Real-time event detection(social sensors) A number of studies have focused on

the role of social media for detecting major social events in real-time. For example,

Lotan et al. [37] conducted an empirical study on the networked production and dis-

semination of news on Twitter during snapshots of the 2011 Tunisian and Egyptian evo-

lutions by differentiating between authors’ user types and analyzing patterns of sourcing

and routing information among them. Their analytical result is followed by a discus-

sion of how social network or micro-blogging services such as Twitter play a key role

in amplifying and spreading timely information across the world. In addition to this

literature, many other papers [38–41] study the role of Twitter as a social sensor, and

a discussion of all works is beyond the scope of this project report.

Chapter 3

Twitter

In this project, we focus on Twitter for the two different experiments in Chapter 4

and 5 to evaluate our predictive models. Twitter is currently one of the most popular

social network services on the web [21]. Most social network services such as Facebook,

for example support only ‘bi-directional’ relationship between two users. This means,

the relationship between two persons can be established once both entities accept each

other as friends. However, in Twitter, a single user can complete a relationship and this

action is called ‘following’. Once a user follows another user or account, that particular

simplex-channel (uni-directional channel) becomes activated and this user can receive

all the contents published by the other account lying on the opposite end (being followed

by this user). This mechanism is identical to the action of subscription on a conventional

online publisher and illustrated in Figure 3.2.

Figure 3.1: Distribution of content in Twitter (excerpt from [1] and mashable.cominfographic, 2010)

11

12 CHAPTER 3. TWITTER

3.1 Twitter Jargons

When Twitter opened its service to the public in 2008, it was a simple micro-blogging

service which only served tweet postings of 140 character between users. Popular terms

such as “retweet” or “mention” did not exist from the beginning. The majority of users

only used Twitter as a simple discussion channel or personal journal. People scribbled

their ordinary life or jotted down some of their lightning ideas on Twitter. Those sin-

gle lines of text were named tweets. However, as people discovered the convenience of

uni-directional relationships for sharing information, Twitter users coined information

sharing mechanism as of ‘retweeting’, which exploits Twitter’s social network structure.

In this perspective, Twitter can be viewed as a self evolving communication medium.

3.1.1 Common Twitter Coins

Below definitions can be found on the official Twitter web page.1

Tweets Any message with fewer than 140 characters posted to Twitter.

Mentions A Tweet containing another user’s Twitter username, preceded by the “@”

symbol, like this: Hello @NeonGolden! What’s up?

Replies A Tweet that begins with another user’s username and is in reply to one of

their Tweets, like this: @NeonGolden I can’t believe you thought that movie was cheesy!

Direct Messages (DM) A personal message sent directly to someone who follows

you or sent directly to you from someone you follow.

Retweet (RT) A retweet is a re-posting of someone else’s Tweet. Twitter’s retweet

feature helps you and others quickly share that Tweet with all of your followers. Some-

times people type ”RT” at the beginning of a Tweet to indicate that they are re-posting

someone else’s content. This isn’t an official Twitter command or feature, but signifies

that one is quoting another user’s Tweet.

1Twitter Basics, https://support.twitter.com/groups/31-twitter-basics#topic 109

3.2. FOLLOWER AND FOLLOWEE 13

3.1.2 Abbreviations

We briefly summarize some of the most useful technical abbreviations on Twitter2 in

Table 3.1 below.

Abbreviation Description

MT Modified tweet. This means the tweet you’re looking at is a para-phrase of a tweet originally written by someone else.

RT Retweet. The tweet you’re looking at was forwarded to you by an-other user.

DM Direct message. A direct-message is a message only you and theperson who sent it can read. IMPORTANT: To DM someone all youneed to type is D username message.

PRT Partial retweet. The tweet you’re looking at is the truncated versionof someone else’s tweet.

HT Hat tip. This is a way of attributing a link to another Twitter user.

CC Carbon-copy. Works the same way as email

Table 3.1: Common technical abbreviations on Twitter

3.2 Follower and Followee

Figure 3.2: Depiction of conceptual graphs of following action and information flow inTwitter.

2Twitter Abbreviations You Need To Know, http://articles.businessinsider.com/2010-08-02/tech/30060587 1 tweet-abbreviations-twitter-user#ixzz1ffeSQHCM


In the perspective of social network structure, we can divide the all entities in Twitter

into two different categories: Followers and Followees. Interestingly, the term ‘followee’

is used by a number of literature studied about Twitter, however, it is not an official term

used by Twitter. Currently, ‘Following’ is being used on the official Twitter website. We

provide brief definitions of the two categories below. Following action and information

flow through connected edges can be seen in Figure 3.2 below.

Follower A user can follow other users on Twitter unless an account has modified

private settings. This action can be understood as subscribing to another user’s tweets.

This is because a new connection is established once a user follows the other and counter-

following or confirmation made by the followee is not required. This also means that

the following user receives every tweet from the accounts the user follows.

Followee Here, we use ‘Followee’ instead of ‘Following’ since it is easier to understand

the structure of Twitter network in terms of the relationship between users. Suppose

your account is following a handful of users on Twitter. These users become your

followees, and, at the same time, you are one of their followers. You can also have your

own followers who subscribe your tweets you publish on Twitter.

3.3 Usage of Twitter

Twitter has been used for a variety of purposes since the service was offered to the

public. Although the Twitter webpage only asks to users regarding daily personal

events, a number of different usages have been discussed from previous literatures. Java

et al. [30] and Naaman et al. [22] grouped Twitter usages into several categories such as

daily conversations, information sharing/seeking, self promotion and so on. Westman

and Freund [42] conducted genre analysis on tweets using co-occurrence percentages

for genre features and identified values for 5 out of 6 genre facets: who, what, where,

when, why, and how. As the number of accounts with business purpose increases in

microblogging services, a few studies started to look at this new research field. For

example, Popescu and Jain [43] embarked an initial exploration of business Twitter

accounts by trying to identify business accounts as well as deals and events in Twitter.

3.4. NOISE DATA IN TWITTER 15

3.4 Noise Data in Twitter

We discussed the transition of usage of Twitter with its increasing popularity in the

previous section. As a result, we can witness more frequent spam, rumors and false

information on our Twitter timeline. For instance, during election season, a large num-

ber of rumors or biased opinions about a specific party or candidate arise within the

entire topic of elections [44,45]. Moreover, critical events such as natural disasters cause

immediate responses online from a massive amount of people living around the epicen-

ter of that event in order to convey the atmosphere of the real-time situation. In this

process, word-of-mouth news or malicious propaganda affects urgent information flow

along with factual information sources. A good example of studies dealing with this

problem is Mendoza et al. [39]. They thoroughly investigated the behavior of Twit-

ter users during the 2010 earthquake in Chile. Although this is a post-hoc analysis of

an emergency situation, their time-variant analysis of propagation following information

dissemination shows that false rumors tend to be questioned much more and their speed

of dissemination yields downward curve in contrast to the confirmed truth.

Chapter 4

Experiment I: Credibility

Modeling Topic-specific Credibility in Twitter

As mentioned in previous chapters, three individual experiments were conducted to

evaluate our predictive credibility assessment model and topic-similarity model as well

as feature distribution analysis. In this chapter, we detail motivation, experimental

setup, evaluation and discussion of our experimental evaluation of the topic-specific

credibility in Twitter.

4.1 Motivation

A logical basis of the fact that credibility of information source matters with regard

to the user-provided content has been discussed in the previous chapters along with

a number of related works. Now, we would like to discuss our first experiment for

evaluating credibility based on the three different models: social, context and hybrid

model and its result. These models are developed from previous feature analysis using

content of tweets and user profile information.

4.2 Credibility Models

This experiment focuses on two different sets of features–social features and content

features–in order to model credibility in microblogs. In total, three credibility models

including a combination of the both feature sets named ‘hybrid model’ are introduced

17

18 CHAPTER 4. EXPERIMENT I: CREDIBILITY

in this section. Before providing detailed information of the credibility models, we first

present the definition of credibility that we use in this experiment.

4.2.1 Definition of Credibility

For the purpose of our discussion about information credibility, we firstly define two

types of “credibility” within a specific topic of interest. The two definitions of the

credibility we discuss here are as follows:

Definition 1 Tweet-Level Credibility: A degree of believability that can be assigned

to a tweet about a target topic, i.e: an indication that the tweet contains believable

information.

Definition 2 Social Credibility: The expected believability imparted on a user as a result

of their standing in the social network with regard to a specific topic domain, based on

any and all available metadata.

Castillo et al. [5] also had similar notion of credibility as our “tweet-level credibility”

except the topic level constraint that we have. Both tweet-level(content level) and social

level credibility are complimentary so that an individual tweet bears social credibility

of its author, and vice versa.

4.2.2 Modeling Credibility

When it comes to the recommendation system, the scope of a single recommendation

used to be based on a user’s interest. In other words, recommendations as outcomes of

the system must be personalized in order to satisfy one’s expected interest in something.

Traditional recommendation strategies adapted by [46–48] compute a personalized set

of recommendations for a target user harnessing a user’s profile information with her

item preferences. However, since we are not targeting an individual user’s preference

but focusing on a topic-specific setting, a group of people who share a common topic of

interest can benefit from our algorithm. Since the majority of newsworthy and reusable

information being shared among users on social networks are topic-specific information,

and social links in the network are also established based upon a topic of interest, we

focus on predicting credible information within an individual topic.

4.2. CREDIBILITY MODELS 19

CredibilityModel

Description

SocialModel

A weighted combination of positive credibility indicatorsfrom the underlying social network.

ContentModel

A probabilistic language-based approach identifying pat-terns of terms and other tweet properties that tend to leadto positive feedback such as retweeting and/or credible userratings.

HybridModel

A combination of the above, firstly by simple weighting, andsecondly through cascading / filtering of output.

Table 4.1: Three predictive models for identifying credible information from Kang etal. [2]

Now, we define the domain of our experiment, followed by the three different credi-

bility models. To evaluate the utility of our models for predicting information credibility,

we trained several classifiers using 5,000 manually rated tweets from a user evaluation.

Detailed process and results of the analysis are followed in the evaluation section.

Definition 3 The Twitter domain can be represented as a quintuple (U,Fo, Fe, T,X),

where Fo and Fe are two U×U matrices representing binary mappings f ∈ Fo, Fe 7→ 0, 1

between users in U (termed “follower” and “following” groups, respectively). T is the set

of tweets, distributed over U , and X is the set of topics in T .

As it can be found on the above definition, our domain encompasses both content

and social metadata. We will show detailed information of the social metadata in the

next section. Now we propose the following three approaches for identifying credible

information source along with a brief definition for each model enumerated in Table 4.1.

Social Model

The web of connected user accounts on Twitter enhances information flow and helps

users to easily disseminate their own-published information as well as other users’ con-

tent by “retweeting” action. However, its short length of postings makes the task of

identifying credible information more difficult, since publishing activity is frequently

performed and, furthermore, users tend to post newsworthy information in multiple

of statuses with shortened urls pointing to external sources due to the 140 characters

constraint. That is, the timeline of each user in Twitter changes dynamically. This


ephemeral nature is also enhanced by ease of using content metadata such as hash-

tag(#topic), mention(@screen name) and retweet(RT @screen name). We believe that

our social model can help dealing with the characteristic of dynamic content in Twit-

ter by leveraging social features embedded in its own network structure. For instance,

observation of the ratio of follower to followee tells us the type of specific user account.

Celebrities or public accounts of news archives apt to have followers up to three or-

ders of magnitude greater than their followings. As an illustration of the celebrities in

Twitter, we can consider the Internet pioneer Tim Berners-Lee. As of June 1, 2012, his

account in Twitter follows 108 users and has 69,663 followers. In this sense, underlying

social features in our social model distinguish celebrities and spammers from general

user accounts in Twitter, grouping the texture of information provenance into several

categories. Our social model attempts to alleviate these problems by weighting various

credibility indicators within a target topic.

Since a majority of the postings in Twitter are “retweets”, we firstly identify credi-

bility of information by measuring the retweet ratio of the information source. Equation

4.1 shows a weighted credibility based on the deviation of a user u ∈ U ’s retweet rate

RTu from the average retweet rate RTx in a specific topic x ∈ X. In our experiment, a

log − log scale is used to soften the imbalanced distribution of values and enclose large

outliers in the data. Notation has been left out for simplicity.

CredRT (u, x) =∣∣RTu −RTx

∣∣ (4.1)

Equation 4.2 measures utility of a user in disseminating information by calculating

number of retweets and number of followers Fe. Since the number of followings can be

considered as a potential number of retweets, and the deviation between a user space

and topic space becomes a utility metric.

UtilityRT (u, x) =

∣∣∣∣RTu,x × Fu,e

tu,x− RTx × Fx,e

tx

∣∣∣∣ (4.2)

Information flux can also be a good metric for identifying active information source

on a social graph structure. In this sense, we believe that the network topology functions

as a good indication of credibility of a user. Equation 4.3 computes a social credibility

score as the deviation in the number of user u’s followers(information subscribers) from

the average number of followers in the topic space. This is normalized by number of

4.2. CREDIBILITY MODELS 21

tweets.

Credsocial(u) =

∣∣∣∣Fo(u)

tu− Fo

t

∣∣∣∣ (4.3)

As we discussed in the beginning of this sub-section, we can now also weight previous

Equation 4.3 by factoring in the ratio of friends to followers as a difference from the norm

for a given topic. For example, an information gathering agent for a direct marketing

company is likely to follow many profiles, but have few followers. Equation 4.4 describes

the social balance of a user u as the ratio of follower (Fo) to following (Fe) group size.

Balancesocial(u) =Fo(u)

Fe(u)− Fo

Fe

(4.4)

We also take social connections within a specific topic space into account for iden-

tifying credibility. This can be express as follow by taking a mean value of followers in

a particular topic space.

Credsocial(u, x) =

∣∣∣∣Fo(u, x)

tu,x− Fo,x

tx

∣∣∣∣ (4.5)

Content-based Model

Since Twitter only allows short content, it is especially difficult to make credibility

judgements about tweets. However, we believe that this characteristic also helps us find

certain patterns in the text such as punctuation or matching words to some lexicons.

One of our assumptions is that some habitual expressions or particular vocabularies

are highly likely to appear for a specific type of information. For example, a personal

conversation contains emotional words more frequently than news articles. Accordingly,

our second model focuses on the text of tweets in order to find linguistic patterns from

the content of Twitter.

In the content-based model, we assigned 12 numeric and 7 binary features as in-

dicators of information credibility. These feature values will be used later to predict

manually annotated credibility score from our real-world user survey. Approximately

half of the these features are taken from a 2011 study by Castillo et al. [5]. However,

our features are applied to individual tweets, and this approach differs from their work;

Castillo et al. defined a much larger set:10 tweets are bound to make a single input

document in (1) a pool of previously selected factual tweets and (2) a multiple topic


setting. In this model, we only included content-specific features only because we purely

focus on context-based setting.

Numeric Indicators:

1. Positive Sentiment Factor : Number of positive words (matching our lexicon)

2. Negative Sentiment Factor : Number of negative words

3. Sentiment Polarity : Sum of sentiment words with intensifier weighting (x2) (’very’, ’ex-

tremely’ etc)

4. Number of intensifiers: ’very’, ’extremely’ etc., based on our lexicon.

5. Number of swearwords: Simple count, based on lexicon.

6. Number of popular topic-specifc terms: Simple count, based on lexicon.

7. Number of Uppercase Chars: Simple Count

8. Number of Urls: Simple Count

9. Number of Topics: Number of topics ’#’ (All have at least 1)

10. Number of Mentions: Number of user’s mentioned with ’@’

11. Length of Tweet (Chars): simple count.

12. Length of Tweet (Words): simple count.

Binary Indicators:

1. Is Only Urls: No text, only links.

2. Is a Retweet : From metadata

3. Has a Question Mark : ’?’ or contains any of Who/What/Where/Why/When/How

4. Has an Exclamation Mark : ’ !’

5. Has multiple Questions/Exclamations: ’??’ ’???’ ’ !!’ ’ !!!’ etc.

6. Has a positive emoticon: :) :-) ;-) ;) etc.

7. Has a negative emoticon: :( :-( ;-( ;( etc.

Hybrid Model

We have discussed two different models: the“social model”that harnesses user metadata

in a social graph and the “content-based model” which discovers linguistic patterns

4.3. EXPERIMENT SETUP 23

from tweet text itself, respectively. Our initial intuition was that the integrated set of

all features we have used would result in a better predictive outcome for identifying

credible/non-credible provenance of information. Therefore, proposed this integrated

feature set as a third “hybrid” model to maximize predictive ability from both social and

content-based features. This exhaustive approach has higher granularity of fingerprint

information in order to accurately predict desired credible information and its source.

4.3 Experiment Setup

So far, we have presented the definition of credibility, the domain of Twitter and three

different credibility models. In this section, we incorporate these models into a real

world system to recommend credible source of information in microblogs. To conduct

our exploratory experiment, we first collect 8 different topic-specific data sets from

Twitter and conduct a brief statistical analysis on the data to understand how each set

of information varies across different topic spaces. Next, we provide the procedure of

the online survey performed for collecting manually annotated credibility scores from

real world users as our “ground truth” for the evaluation of our models. In the following

paragraph, we describe how our crawler works and provide our methodology in detail.

4.3.1 Data Mining and Preliminary Analysis

Figure 4.1: Illustration of the scope of crawled datafor each of 7 current topics.

In order to evaluate our credi-

bility models on a topic-specific

setting, we first crawled tweet

texts and the authors’ profile

information as well as their

network structure in Twitter.

To be specific, we developed

our crawler program using both

Twitter REST and Streaming

APIs through the“python-twitter”

wrapper1. The primary reason

1python-twitter : A python wrapper around the Twitter API. http://code.google.com/p/python-twitter/


for using python as a platform for our crawling task was that python supports opti-

mized environment for natural language processing with its brevity in programming

expression. The major challenge in this task was the rate-limiting policy for tweets and

network information imposed by the Twitter API. For example, additional tweets or a

list of followers/followings of a particular user can be fetched in the rate of 150 queries

per hour with 200 tweets per a single query. In contrast, seed tweets and their user

profile information were easily obtainable through the Twitter Streaming API in real-

time2. To overcome the time constraint stated above, we ran our crawler for 2 months

from a cluster of 12 machines using 14 different Twitter authentications. Through the

Twitter API, tweet texts along with content and social metadata are returned in JSON

format, and are stored in a relational database on our server after a series of simple

parsing processes. For example, we can parse location or language information of a user

using parameters such as ‘geocode’, ‘lang’ or ‘locale’ from ‘GET search’ method of the

Twitter API. A conceptual diagram is depicted in Figure 4.1.

We carefully chose 8 different topics for our experiment. The main criteria are as

follows: (1) trendy topics that yield comparatively sufficient amount of postings within

a given time (e.g. 100 statuses in a minute) (2) topics that allow us to obtain significant

interconnections in the underlying social graph. This rich network structure will be

utilized later in the following chapter. After we collected the seed tweets and their

author profiles, further data collection process was performed for each topic to gather

network structures (including followers and followings of the seed authors) originating

from the original set of authors as well as all the tweets posted by those authors. As

illustrated in Table 4.2, the network information of the authors are significantly larger

than core author and seed tweets from the first crawling process. In this experiment,

we primarily focus on the “Libya” dataset since it has the highest interconnections in its

own social graph. In addition, a simple visualization of physical information sources for

the collected tweets in the Libya data set is shown in Figure 4.2. Also, the word cloud

visualization of the same data is shown in Figure 4.3.

2The Streaming API roughly provides from 0.5% to 5% of real-time statuses. A larger amount oftweets can be obtained by registering the “Firehose” service in Twitter. This stream of information isdelivered through UDP network connection.

4.3. EXPERIMENT SETUP 25

Set Name Core Authors Core Tweets Fo and Fe

(with overlap)Fo and Fe

(distinct)

Libya 37k 126k 94m 28mFacebook 433k 217k 62m 37mObama 162k 358k 24m 5mJapanquake 67k 131k 25m 4mLondonRiots 26k 52k 30m 4mHurricane 32k 116k 35m 5mEgypt 49k 217k 73m 36m

Table 4.2: Overview of 7 topic-specific data collections mined from both the TwitterREST and streaming APIs.

Figure 4.2: Word cloud showing origin of tweets in the Libya data set.


Figure 4.3: Word cloud showing distribution of popular terms in the Libya data set.(This visualization is generated using Wordle, http://www.wordle.net/)

4.4 User Survey

To collect credibility assessments on our dataset from real users and evaluate our pre-

diction models, we conducted an online user survey. For this survey, a set of tweets from

the“Libya”dataset is used. In total, 145 participants were asked to rate their impression

of the credibility of 40 tweets on a Likert scale ranging from 1 to 5. An individual tweet

is only shown to one participant and not to others again. Our respondents also had

an option to choose “can not answer” and it is scaled as ‘0’ along with other 5 ratings.

The tweets marked with a “can not answer” option are all discarded in order to avoid

unwanted bias to our result. The population of this survey displays that we had 39%

female, 61% male, varying in age from 19 to 56 with an average age of 26. The detailed

statistics of our participants can be seen in Figure 4.4.

In this experiment, we were also interested in the effects of various contexts including

those from Castillo et al. [5] on perceived credibility. Accordingly, we examined the effect

on perceived credibility scores of the respondents by providing four different source

context (dependent variable). Each context is presented as an individual group in the

survey showing 10 tweet texts with or without additional contextual information. Group

1 provided only the tweet text in each question. Group 2 provided statistically poor

property of social metadata for the tweeter (e.g. less number of followers, followees and

retweets) along with the raw text of each tweet. Group 3 provided statistically good,

and socially strong, property of the tweeter (e.g. significantly more number of followers,

4.4. USER SURVEY 27

Figure 4.4: Three graphs showing (a) gender distribution, (b) familiarity with microblog-ging service and (c) age distribution of the participant of the online user survey in thisexperiment. (from left to right, respectively)

Figure 4.5: Plot showing perceived credibility in each of four Twitter contexts (Negative,Positive, True, Null).

followees and retweets) with the raw tweet text in each question. Lastly, group 4 was

shown the true context, i.e. the tweeter’s real number of followers, followees and retweets

along with the tweet text. Except the final group (true context), the other three groups

are only used for evaluating the effect on perceived credibility scores. As can be seen in

Figure 4.5, the average perceived credibility score in the positive context is ranked as

the highest. In this figure, the positive context (2.34) shows an increase of 26% in the

average perceived credibility score from the negative context (1.58). Clearly, this result

implies that the context of Twitter does have an effect on perceived credibility.


4.5 Evaluation

So far we have presented our experiment setup and described an online user survey to

collect ground truth credibility ratings in the previous sections. Now we evaluate our

credibility models by predicting the crawled dataset with manually annotated tweets

with credibility ratings from real users. The evaluation is done by comparing prediction

accuracy of each model. Since we could only collect a sufficient amount of resources

on the “Libya” dataset due to the rate limitation of the Twitter API, we focus on that

dataset for most of the following evaluation.

4.5.1 Data Analysis

Before we discuss results of each prediction model, we now analyze some interesting

patterns that we found from our crawled dataset in a broader perspective. A thorough

analysis on all the features we computed in this experiment is left for the next chapter

as another individual experiment. Through the macroscopic analysis on the crawled

dataset, we arrive at a high level understanding of the Twitter domain as well as an

additional insight about some anomalies in feature distribution. We focus on a selec-

tion of the most influential features in the predictive models such as Fo, Fe, links and

retweets. These features were mainly identified using the best-first feature analysis in

WEKA3.

Followings and Followers The illustrations of ‘followings to followers’ patterns

across five of our topic-specific datasets are shown in Figure 4.6. The first plot (a)

in Figure 4.6 shows the number of followers Fo to the number of following users Fe over

the 37,000 core users in the Libya dataset. In this graph, we labeled four typical ar-

eas: ‘suspicious zone’, ‘cold start zone’, ‘low credibility zone’ and ‘celebrity zone’, with

shaded boxes. For example, both the ‘celebrity zone’ and the middle area on the bottom

of this plot, which represent accounts having reasonable number of followings, presum-

ably contain most of the credible users. Intuitively, celebrities in Twitter tend to have

a reasonable number of followings with a significantly large number of followers. Since

the accounts following unreasonable number of users are highly likely to be automated

agents (substantially large number of Fo and Fe) or information collectors (substantially

3www.cs.waikato.ac.nz/ml/weka/

4.5. EVALUATION 29

large number of Fe and only a few number of Fe). However, the ‘cold start zone’ which

has small number of Fe and Fo represents new or irregular users. Since users in this area

do not have sufficient social information, they are considered as low credibility accounts.

We found another interesting pattern along the line of threshold on the following axis

(2,000 followings) from all of the 5 graphs. In fact, this limitation is imposed by Twitter

in order to prevent system strain and limit abuse4. However, this line on the log-log

scale can also be equivalent to the long tail of a power law distribution, as defined by

Barabasi in his Nature paper [49]. In general, we found similar distributions across all

of the topics in Table 4.2.

Credibility Distribution across Followers Figure 4.7 shows a histogram of the

average credibility score for tweets in the online user survey compared with number of

followers. In this plot, credibility scores for individual tweets are binned with regard

to the corresponding number of followers. A pattern of monotonic increase in credi-

bility rating is observed up to a following size of approximately 1,500. However, we

have a sharp decrease after 100,000 followers. This result is reflected in the “balance”

component of our social model and these outliers (followers ≥ 1,500) are penalized by

weighting socially balanced users in Equation 4.4.

Links and Other Credibility Features We now briefly discuss other interesting

discoveries from a number of feature distribution analyses. The feature distribution

graph in Figure 4.8 and the feature correlation graph in Figure 4.9 show an overview of

a subset of the features that we used in this experiment (colored by credibility rating5).

Features of the external link, which is equivalent to the presence of urls in each tweet,

exhibits a slight correlation to credibility and, occurred more frequently in younger

profiles. Presence of links in tweets also shows an interesting correlation to sentiment

(number of occurrence of positive or negative sentiment keywords): if the sentiment

score6 is polarized in either direction, links rarely occur, and users tend to assess cred-

ibility more highly. For example, retweets including links have an average credibility

4Twitter’s technical follow limits. https://support.twitter.com/articles/66885-follow-limits-i-can-t-follow-people

5red: credible tweets, blue: non-credible tweets (binarized)6sentiment score = sentiment pos - sentiment neg


(a) #Libya

(b) #JapanQuake (c) #Hurricane

(d) #EnoughIsEnough (e) #Facebook

Figure 4.6: Friend to Follower patterns across four of our topic-specific data sets. Allother sets exhibited a similar distribution.

4.5. EVALUATION 31

Figure 4.7: Histogram of average credibility rating from the online survey across numberof followers for the tweeters (binned).

rating of 1.4589 while retweets without a link have a score of 1.3702, showing a relative

increase of 6.47%. Figure 4.10 shows the distribution of credibility in the two contexts

of link existence across four different groups in our online survey. More detailed analysis

across all of the features is covered in Chapter 5

4.5.2 Predicting Credibility

To evaluate our credibility models (social, content-based and hybrid models), we per-

formed a total of 3 prediction tests with a set of manually evaluated tweets from the

online survey. Each individual test represented one of our credibility models. After we

compute all of the individual components from each credibility model, a set of weighted

features are obtained. Three different feature sets were loaded as an input file to the

WEKA7 machine learning toolkit. The objective of this evaluation was to accurately

predict our manually labeled“ground-truth”data from the survey. First, we performed a

preliminary evaluation by applying a variety of classification algorithms such as Baysian

classifiers and a number of the other decision tree algorithms supported by the WEKA.

Afterwards, we chose to use a J48 (C4.5) decision tree algorithm since (1) this algorithm

showed the best performance in our prediction task and (2) it allows us to compare our

results with Castillo et. al’s similar evaluation in [5].

7www.cs.waikato.ac.nz/ml/weka/


Figure 4.8: Plot showing a sample of feature distributions for the content based modelon a 3000 labeled tweets from the Libya dataset.

Figure 4.9: Comparisons of each feature used in computing the social credibility model.

4.5. EVALUATION 33

Figure 4.10: Comparisons of link and no-link contexts across four different groups inthe online user survey.

Figure 4.11: Comparison of predictive accuracy over the manually collected credibilityratings for each model.


After we filter out a set of tweets with the rating of 0 (“can not answer”) and 3 (“the

median score with possible ambiguity”) from the group of ‘true’ context, predictions

were performed on a training set of 591 tweets with annotated credibility scores. A 10-

fold cross-validation was applied, and training sets were separate from test sets. Four

different credibility ratings (1,2,4,5) were split into two classes (1,2 for ‘non-credible’

class and 4,5 for ‘credible’ class), and the J48 algorithm classified each test instance

into one of those two classes. In our evaluation, the prediction process is performed at

the tweet level and this is one of the major difference compared to the evaluation in

Castillo et al. [5]. Class instances were evenly distributed in the training sets.

The results of the evaluation are shown in Figure 4.11. In this experiment, both the

content-based and hybrid models fairly performed at the prediction task, however, the

social model showed significantly higher performance than those two other models with

an accuracy of 88.17%. In other words, there was a remarkable improvement of 11% over

the next best performer (the hybrid model). At the beginning of this experiment, our

expectation was that the more features used, the more accurate our predictions would

be. However, the results seem counter-intuitive since the social model outperforms the

hybrid and content-based methods significantly.

According to this result, the features from the content-based model supposedly have

somewhat negative effects on the performance of the hybrid model. Since the content-

based model displayed the lowest performance among the three credibility models, the

fact that the hybrid model performed as the next best also supports this supposition.

As we mentioned in the previous section, a closer analysis across all of the features of

the content-based model will be covered in the next chapter. Also we assume that the

short length of tweet text attributed relatively poor performance (67%) of the content-

based model in this prediction task. An overview of the statistical output from the

J48 learner process is provided in Figure 10 for our best performing method, showing

a correct classification of 902 of the 1023 instances, yielding 88.172% accuracy. In

conclusion, our results in this experiment indicate that the underlying network and

dynamics of information flow are better indicators of credibility than text content.

4.6. DISCUSSION 35

Figure 4.12: Statistical results from a J48 tree learner for the best performing credibilityprediction strategy (Feature Hybrid)

4.6 Discussion

In this experiment, we conducted an evaluation on our three computational models in

predicting topic-specific credibility on a set of crawled tweets. However, we believe that

this is not an exhaustive study of credibility models and, therefore, there are still better

prediction strategies to be found. According to the results in this study, content-based

model seems to have rather negative effect than those features from the social model

in predicting credible information. Because of this, we need to find other underlying

content-based features in microblogs that offset interfering noisy features that we found

here. As we will discuss more in the next experiment, we need to scrutinize various

complex correlations between different features and align them with our desired patterns

in order to maximize predictive performance. When this goal is successfully performed,

we will be able to generalize our model in the domain of microblog showing stable and

robust performance in predicting credibility.

Chapter 5

Experiment II: Feature Analytics

An Analysis of Feature Distributions in Twitter

In Chapter 4, we have discussed credibility assessment models from two pre-defined

feature groups in addition to a combination of models in terms of predictive accuracy

over a set of manually collected credibility ratings on crawled tweets. Now, we take a

closer look at feature distributions in Twitter in order to find the utility of individual

features for predicting credibility of information source. For example, the utility of an

individual feature can be measured in terms of predictive accuracy, and also presence

of a feature in a specific topic domain. This experiment is based on (1) context of

credibility, (2) multiple topics, (3) dyadic pairs of tweets (conversation tweets) and (4)

retweet chain lengths. We start by providing a brief motivation of this project and our

dataset used in this experiment. Next, a context-based evaluation of a set of feature is

presented for predicting manually provided credibility ratings on a collection of tweets,

followed by an evaluation of the feature distribution across 8 diverse topics crawled

from Twitter. Finally, an analysis of feature distribution across dyadic pairs of tweets

and retweet chains of various lengths is described. For the ease of comparison, we use

previously evaluated 5025 tweets (on page 26) from the user survey.

37

38 CHAPTER 5. EXPERIMENT II: FEATURE ANALYTICS

Class Description # of Contexts

Diverse Top-ics

Diverse topics in Twiter; eg:#Romney #Facebook

8 different topics (see Table 5.2)

Credibility Manually provided assessmentsof tweets

Credible or non credible

Chain length Mined retweet chains and classi-fied based on length

Long, short, no chain

Dyadic pairs Mined interpersonal interactionand classified

Dyadic or non dyadic

Table 5.1: Context classes in our Evaluation.

5.1 Motivation

Credibility models can leverage many features for predicting trustworthiness of infor-

mation source. However, there are some important issues to address when it comes to

the specific context, circumstances or scenarios. In other words, each of a variety of

contexts require different set of features in order to make the most optimal credibility

assessment.

In order for us to answer the question: “which features are best for determining

content credibility?”, we would like to propose a method to study credibility related

features along various dimensions. The objective of this preliminary study is to analyze

the distributions of the features across multiple contexts, and confirm that whether a

cluster of features would improve the performance of credibility assessment. We also

study that which features tend to provide similar or distinct types of information. We

can use these categories to reduce our the complexity of feature dimension and study

the contribution of each category to the ground truth on credibility.

5.2 Data Gathering and Processing

We have gathered 8 different datasets based on specific topics. These sampled tweets

represent several categories of topic which are frequently discussed in Twitter such as

“revolution”(#Libya, #Egypt),“natural disaster”(#Earthquake),“Movement”(#EnoughIsE-

nough), “Politics” (#Romney), “Sports-Big Match” (#Superbowl) and “Daily Chatter-

ing” (#Love and #Facebook). This set was chosen because our study focuses on eval-

uating distribution of multiple credibility features in various contexts (See Table 5.1).

These eight topic-specific datasets are listed in Table 5.2. The table contains number

5.2. DATA GATHERING AND PROCESSING 39

Set Core Core Fo and Fe Fo and Fe

Name Tweeters Tweets (overlapped) (distinct)

Libya 37K 126K 94M 28MEnoughIsEnough 85K 129K 13M 4MEgypt 49K 217K 73M 36MEarthquake 67K 131K 15M 5MSuperbowl 191K 227K N/A N/ARomney 226K 705K N/A N/AFacebook 433K 217K 62M 37MLove 312K 227K 21M 7M

Table 5.2: Overview of 7 topic-specific data collections mined from the Twitter stream-ing API.

of each subset of individual dataset. Table 5.3 shows a set of sample tweets from each

topic that illustrates content diversity of our sampled dataset. Our crawling process is

also described as a pseudocode in Figure 5.1. We first evaluate feature distributions

of diverse topic context as a general analysis and focus on the “Libya” dataset in three

specific contexts in Section 5.4.

5.2.1 Online Evaluation of Credibility

Our first specific context on the “Libya” dataset is “credibility”. In order to classify and

evaluate this context, we conducted an online evaluation through two different phases

and obtained a set of crowd-sourced reference credibility ratings on our sampled tweets.

This online evaluation is performed on Amazon Mechanical Turk. 700 participants were

asked to rate their impression of the credibility of 30 tweets on a Likert scale. Tweets

used for a participant is not shown again to another respondent. To maintain objectivity

of our evaluation, we let all the participants can select “0” which represents “can not

answer” as well as the scores ranging from 1 to 5 and discarded the tweets marked as

0 (This process is not shown to the respondents and processed after closing the online

evaluation.) We also provided 4 general questions and 7 pre-test questions to each

participant. General questions are used for understanding population and familiarity

with microblogging services of our respondents. To validate the result of this evaluation,

pre-test questions are used to filter out unreasonable respondents.

Demographic statistics of our evaluation show that overall participants were 61%

male and 39% female, varying in age from 19 to 56 (median 28). Participants were

generally familiar with the Twitter (4 out of 5 rating on average). In total, 10,851


Set TweetsName

Libya #Libya: Muammar #Gaddafi’s base takenhttp://t.co/UKvSn7Jk #drumit

Superbowl RT @mashable: The Giants may have won the #Super-Bowl, but Madonna won the Google search competition -http://t.co/YRqErdkg

Romney #Romney outlines economic plan - cut taxes across the boardto boost economy, but won’t add to the deficit. #GOP2012http://t.co/AjbDWLKy

Love You deserve better friends #love you.

Facebook I posted 15 photos on Facebook in the album “21.09.2011. MCat UCLA Royce Hall” http://t.co/9WQQcfRS

Enoughisenough i Mean come on Now #EnoughIsEnough

Egypt #Egypt #July8 Egypt Mubarak-era minister jailed for corrup-tion Albany Times Union http://t.co/5ucF7kDA #Feb17

Earthquake A light intensity #earthquake, of magnitude 4.3 on the RichterScale, occurred here at 8.51 p.m.

Table 5.3: Examples of 8 topic-specific tweets from the crawled sets.

1: procedure crawl(topicsList, tweetsList)2: store = ∅3: for all topic ∈ topicsList do4: store← topic5: topicTag = topic.getTopicTag()6: for all tweet ∈ tweetsList do7: if tweetContains(topicTag) then8: store← getRelevantTweet()9: end if

10: end for11: for all tweet ∈ store.getTweets() do12: store← getUsers()13: end for14: for all user ∈ store.getUsers() do15: store← getTweets()16: end for17: for all user ∈ store.getUsers() do18: store← getFollowers()19: store← getFollowings()20: end for21: end for22: end procedureFigure 5.1: Data crawling procedure used for gathering 8 datasets from the twittersocial graph, consisting of over 20 million users and 200 million tweets in total.

5.3. FEATURE SET 41

tweets were rated with an annotated credibility score on the Libya topic collection.

5.2.2 Retweet Chains

Additional crawling process was performed on the Libya dataset from Table 5.2 to

understand the influence of retweet activity on feature distribution. In this process, we

collected retweet chains (up to 15 hops away) from our core tweets. In total 36,768

chains were obtained, ranging from 1 to 15 hops in length with an average length of

3.5492. We classified the chains into three contexts shown in Table 5.1, (1) long chains,

having 4 or greater than 4 hops, (2) short chains having less than 4, and (3) no chains.

5.2.3 Dyadic Pairs

Conversational tweets can be distinguished from general tweets or retweets in Twitter

domain and this type of tweet is defined by a mention tag (@) followed by a screen name.

In order to analyze whether dyadic communications have distinctive feature distribution

patterns in Twitter, this context is computed on our “Libya” dataset by tracking the

“@” mention tag. A group of tweets constructing a pairwise conversation with at least

two messages are selected from the entire set of tweets. As can be shown in Table 5.1,

this context contains two simple classes, (1) dyadic, and (2) non dyadic.

5.3 Feature Set

In the domain of Twitter, a variety of features to predict credibility of a user or a piece

of information can be found and extracted based on various viewpoints about criteria

defining “features”. For example, we can consider a number of features in Twitter

based on semantic, psychological or social perspectives in various contexts. Recent

studies [2, 5, 39, 50] investigate features in Twitter by classifying them into social and

content-based classes. Features can be categorized into content-based, network-based

or social feature types as well. In this experiment, we define a number of features into

three different feature classes: social, content and behavioral. A set of features used in

this study is shown in Table 5.4.

Social Features The social class represents features which are derived from prop-

erties of users or social network structure in the microblog. For instance, age of an


Name %Present

Averagescore

Class

age 100.00 610.64 Social

listed count 100.00 11.82 Social

status count 100.00 554.49 Social

status rt count 100.00 10.17 Social

favourites count 100.00 57.96 Social

followers 100.00 295.15 Social

followings 100.00 315.03 Social

fofe ratio 100.00 5.81 Social

char 100.00 120.55 Content

word 100.00 18.69 Content

question 7.95 0.10 Content

excl 10.10 0.15 Content

uppercase 10.23 11.27 Content

pronoun 92.84 4.22 Content

smile 42.24 0.02 Content

frown 1.81 0.43 Content

url 14.17 0.42 Content

retweet 8.71 0.74 Content

sentiment pos 71.51 1.53 Content

sentiment neg 59.07 1.23 Content

sentiment 74.20 0.29 Content

num hashtag 42.09 0.83 Content

num mention 19.25 0.25 Content

tweet type 100.00 1.10 Content

ellipsis 2.11 0.29 Content

news 5.13 2.03 Content

average balance of conversation 100.00 0.32 Behavioral

average number of friends in timeline 100.00 2086.28 Behavioral

average spacing between statuses in seconds in timeline 100.00 21959.07 Behavioral

average text length in timeline 100.00 104.52 Behavioral

average general response time 100.00 3.27 Behavioral

average number of messages per conversation 100.00 4.34 Behavioral

average trust value in conversation 100.00 0.10 Behavioral

fraction of statuses in timeline that are retweets 100.00 0.55 Behavioral

Table 5.4: The set of Twitter features analyzed in our evaluation. Features are groupedinto three classes: a) Social, b) Content-based and c) Behavioral. Each is a representa-tive subset for the larger feature sets in each class.

5.4. FEATURE ANALYSIS 43

account represents profile information and number of followers or followings contain

social information of a user. This type of features can be obtained through Twitter API

since plentiful social metadata exist along with the original content.

Content-based Features Content-based features are extracted by applying simple

natural language processing algorithms. In Twitter, content-based features can be less

rich compared to those in web pages or other information sources since Twitter limits

each individual content by 140 characters. We have a number of complex content-

based features as well as basic features such as number of exclamation and question

mark, number of word. For example, number of url, hashtag and mention tag can be

considered as content-based feature. Sentiment score is another example of advanced

features in content class. For example, either positive or negative sentiment factors can

be computed by comparing each word with lexicons of keywords. Our content-based

feature set also includes a news feature which is computed through comparison with a

popular news archive. Features in the content class are independent of other factors

such as social or profile information except the content itself.

Behavioral Features Behavioral features represent the dynamics of information flow

in Twitter domain. In Table 5.4, we provide subset of a large class of features which focus

on the dissemination and interpersonal communication in the microblog. By looking at

these behavioral features, we can understand a number of factors affecting the dynamics

of information such as a user’s propagation energy, balance of a conversation and a rich

set of communication statistics. The behavioral features used in this experiment are

from Adali et al. [50].

5.4 Feature Analysis

So far, we have discussed our data collection process, the dataset to be used in this

experiment and description of the features we use with three different categories. In this

section, we provide result of our feature analysis in four different contexts. We begin with

an evaluation in the context of credibility by comparing two different classes:credible

and non-credible set of tweets.


5.4.1 Credibility Analysis

As an extension of our first experiment covered in Chapter 4, we now provide an analysis

of a set of features in the context of credibility. Since the distribution of content-based

features for identifying credible information source was not fully described in our first

experiment, we provide the distribution in two different contexts: credible tweets and

non-credible tweets. Before we begin our discussion on the result we obtained, let us

describe how we evaluate this context.

Credible and Non Credible Tweets To analyze feature distribution in the context

of credible information in Twitter, we need to categorize a set of tweets into two different

classes. To obtain these two separate datasets (credible and non-credible tweets), we

followed the same process used in the first experiment in Chapter 4. However, at this

time, we performed our online evaluation on the Amazon Mechanical Turk in two phases.

This data gathering process is discussed earlier in Section 5.2.

Figure 5.2: Analysis of the distribution of selectedcontent-based features based on two credibilitycontexts.

Figure 5.2 shows two different

mean distribution of features per

tweet in credible and non-credible

contexts. From the online evalua-

tion, we collected 100K tweets rang-

ing from 1 to 5 in Likert scale.

Tweets with scores of 1 or 2 and

4 or 5 were considered as not credi-

ble and credible for this experiment,

respectively. Tweets with score of 3

were discarded in order to reduce

the possibility of ambiguity.

In Figure 5.2, the distributions

of 18 content-based feature occur-

rences are shown across the two

classes. Overall, none of the fea-

tures in this figure presents sig-

nificant difference between credible

and non-credible classes. However,


the features which occurred less frequently such as ‘question’, ‘exclamation’ and ‘smile’

show more frequency in the credible class. The ‘Ellipsis’ feature also occurred more

frequently in the credible class. Presumably, those ellipsis (“...” mark at the end of

text) implies that more contents exist in the external information source linked through

the included urls in each tweet text and is omitted in each tweet.

We found another interesting result from the distribution of sentiment features (‘sen-

timent’, ‘sentiment pos’ and ‘sentiment neg’). Basically, a ‘sentiment’ feature is com-

puted by subtracting ‘sentiment neg’ (negative sentiment score) from ‘sentiment pos’

(positive sentiment score) and both ‘sentiment pos’ and ‘sentiment neg’ are computed

using correlations to positive and negative lexicons. In the distribution of ‘sentiment’

feature (the overall sentiment score), credible class appears more frequent. However, it

can be clearly seen in both ‘sentiment pos’ and ‘sentiment neg’ that the non-credible

class (the negative credibility context) have more frequent occurrence in any sentiment.

This seems to imply that positive sentiment is not necessarily associated with credibility.

The ‘news’ feature is also computed using correlations with top 300 news archive in

terms of the number of followers from the Twibes1 website. This feature shows more

frequency in occurrence in the credible context as we expected. All in all the total score

for the credible context was 0.145. Since the total score of 0.148 for the non-credible

context shows nearly close proximity to the credible context, this implies that the simple

presence of features does not necessarily increase credibility.

5.4.2 Analysis across Multiple Corpora

We have analyzed the distribution in credible context with credible and non-credible

contexts in the previous evaluation. Now, we need to examine if there is possible

variation of feature distribution across multiple topics in Twitter. Since the context

of multiple topics must be distinguished from the context of credibility, we assign the

type of topic as an independent variable by not considering the context of credibility

simultaneously. This approach is adopted to the next two following contexts as well.

To analyze the distribution of content-based features across a number of topics in

Twitter, we use the 8 different topic-specific dataset from Table 5.2. The distribution

of our topic collection is illustrated in Figure 5.3. This figure presents the average

occurrence score of each feature per tweet on the x-axis. All the average occurrence

1www.twibes.com


Figure 5.3: Analysis of the distribution of selected content-based features across a di-verse set of Twitter topics.


(a) A plot of the average occurrence of eachfeature per tweet across all collected data.

(b) Overview of the distribution of all fea-tures per topic.

Figure 5.4: Two different charts showing the average occurrence score of (a) each featureacross all dataset and (b) all features per topic.

scores are normalized with the maximum number of occurrence of each feature, ranging

from 0 to 1. This is because we want to see the relevant frequency in occurrence between

each feature. Clearly, this graph exhibits significant variance in occurrence for the most

of features across different topics.

We found an interesting distribution patterns across multiple topics. As can be

seen in Figure 5.4 (b), which represents the average distributions of all features across

the independent topics, the topics about crisis or emergency (#Earthquake, #Egypt

and #Libya)2 display a significant increase in average occurrence score compared to

the other topics. For example, the topic of #Libya has a score of 0.14 while the topic

of #Superbowl, which is a non-crisis event, scored 0.09. Here, we can see a relative

increase of 55% in feature occurrence to the crisis situation. On the contrary, run-of-

the-mill topics such as #Facebook (0.6) and #Love (0.7) have the lowest scores in the

frequency of feature occurrence. This result implies that we can expect to have better

performance in applying feature-based predictive model on emergency or crisis topics,

since those topics tend to have richer set of features with more frequent occurrence.

However, this tendency can not be considered on every features available since some of

the features show opposite pattern in the frequency of feature occurrence, for example

2These datasets were crawled during the 2011 evolutions occurred in both Libya and Egypt.


‘num mention’ feature in Figure 5.3 showing significantly less frequency in #Libya,

#Egypt and #Earthquake topics than in #Love and #Superbowl. In general, we

found significantly different feature distribution across different topics. According to

this result, we can conclude that the type of topic has more significant effect on the

identification of credible information than each individual feature.

5.4.3 Feature Distribution in Retweet Chains

Figure 5.5: Distribution of predictive fea-tures in three contexts formed aroundlength of retweet chains. a) Non-retweetedcontent b) Short chains (1-3 hops) c) Longchains (≥4 hops).

The third context we use in this exper-

iment is established in order to examine

the behavioral aspects of the Twitter do-

main by closely looking at the feature dis-

tribution across different length of retweet

chains in its social graph. In fact, this is

a combinatorial evaluation of social and

content features rather than simply rely-

ing on the content.

Then, why do we want to observe this

specific context here? Suppose a user have

multiple tweets retweeted by a few users

she follows. Can we say all of those tweets

are retweeted because their contents are

credible? We can not clearly say all the

retweets have credible information since

we have a few counterexamples. For in-

stance, it is easy to witness that a number

of spams or tweets bearing noisy content

without any newsworthy information also get retweeted in Twitter. To find possible

evidence toward this question, we designed and explore this context. Here, we particu-

larly focus on the feature distribution across a set of tweets that were found at the ends

of ‘long chains’, and those that had been propagated, but only in ‘short chains’. The

method we used to distinguish both long and short chains can be found in Section 5.2

“Retweet Chains”.

A set of non-propagated tweets were randomly collected as the class of ‘no chain’


for this comparison. Figure 5.5 shows the feature distribution for this context. It is

notable that the URL feature in the longer chains occurs 50% more frequent than in the

short or no chain contexts. This can imply that tweets with links (URLs) pointing to

external information sources tend to construct longer distance in its propagation chain.

Similarly, tweets involved in long retweet chains have tendency toward constructing

longer content in terms of words and characters.

5.4.4 Feature Distribution in Dyadic Pairs

Figure 5.6: Distribution of predictive featuresdyadic pairs. Tweets were selected for this groupif they occurred in a pairwise conversation be-tween two users in which more than two messageswere exchanged, as measured by the mention andretweet metadata in Twitter API.

The final context in this experiment

is also used to explore the behav-

ioral context in Twitter. We named

this context as “Dyadic pairs of

tweets” since conversational tweets

in Twitter are exchanged between

a pair of users by using the ‘@men-

tion’ or ‘@reply’ tags. These

two tags are devised to help users

point to another user, and distin-

guished from the ‘retweet’ mecha-

nism. That is, if the ‘retweet’ activ-

ity is responsible for broadcasting

information to a set of users, ‘men-

tion’ activity takes charge of dyadic

communication between two users

in Twitter. Again, the detailed

information of the dyadic dataset

is explained in Section 5.2 “Dyadic

Pairs”. Figure 5.6 shows the fea-

ture distributions across two differ-

ent classes for this context. In the

context of dyadic communication,

feature distribution shows lower variance compared to the other three contexts. We

can see slight increase in both ‘char’ and ‘word’ features for the non-dyadic class. A


significant decrease is shown in the feature of ‘uppercase’ for the dyadic context, and

this is presumably showing a pattern in text on the conversational tweets.

5.5 Discussion

In this chapter, we provided an in-depth analysis of the distribution of the salient fea-

tures in Twitter that can be used to find newsworthy [39] and credible [2] information.

The experiment is performed on four individual contexts: credibility; diverse corpora;

retweet chain length and dyadic communication. In this analysis, we have shown that in

general, feature distribution significantly varies across different topics and the frequency

of occurrence of most features tends to increase in emergency or unrest situations. We

carefully conclude that while some features may help in predicting credible information,

the degree of utility for each feature can significantly vary with context, both in terms

of the occurring frequency of an individual feature, and the way in which it is used.

Furthermore, it is still extremely difficult to predict credible information in Twitter

since the complex behavior of the features along with its time-variant characteristics

in microblogs. Since the pattern of information in the domain of microblogs changes

rapidly depending on contemporary trends and issues, our next study will focus on the

classification of topic space with more advanced natural language processing methodol-

ogy, to address the challenge of finding credible content in the rapidly changing domain

of microblogs.

Chapter 6

Conclusion

6.1 Conclusion

As with most information usages and evaluations on the social network and contem-

porary web space, the channel for rapid information flow in Twitter is limited for us

to accurately identify desired credibility. Along with the information overload phe-

nomenon on the Internet, finding reliable and newsworthy information becomes one of

the essential tasks on which we need to work. In this project, since this forum becomes

more popular, we focused on the credibility of information provenance and information

flow in Twitter. Through two different experiments, we provided and evaluated three

computational models in finding information credibility, explore feature distribution in

four different classes across social, content and behavioral features.

In the first experiment, we provided credibility assessment models(social, content

and hybrid models) derived from 8 collections of tweets about trending topics, including

the associated user profile information for each tweeter, as provided by the Twitter API.

Next, an automated evaluation of the predictive accuracy of each model was performed,

predicting on a collection of manually assessed tweets on the “Libya” dataset. This

set of annotated tweets were collected in an online user survey. Results showed that

the hybrid feature combination model outperformed both the social and content-based

models, achieving a predictive accuracy of 89%, compared with 69% and 78% for social

and the next best performing hybrid (weighted strategy) respectively.

The first experiment is followed by an in-depth analysis of the distribution of the

salient features in Twitter. The goal of this experiment was to find the best set of

51

52 CHAPTER 6. CONCLUSION

features in order to find interesting, newsworthy and credible information. The analysis

focused on feature distributions in four distinct contexts: diverse topics; credibility

levels; retweet chains and dyadic interactions. The second result showed that in general

(across 8 data sets), feature usage tends to increase in emergency situations or situations

of unrest. From this result, we concluded that the utility of each feature can vary

significantly with context, both in terms of the occurrence of a particular feature, and

the manner in which it is used. Due to the size and rapid evolution of microblogs such

as Twitter, it was exceedingly challenging to fully understand the subtle links between

feature presence/usage and truly credible information.

A follow up research will focus on combining predictive ability of different features

with both distribution and particular usages to gain a more in-depth knowledge of the

complex interactions that occur in the Twitter space, and to provide insight for future

credibility prediction models. We will also focus on extracting and classifying semantics

from both social and content features for topic-specific domain in microblogs.

6.1. CONCLUSION i

ii CHAPTER 6. CONCLUSION

Bibliography

[1] P. Analytics, “2009 twitter study at http://pearanalytics.com/,” Mar. 2009.

[2] B. Kang, J. O’Donovan, and T. Hollerer, “Modeling topic specific credibility on

twitter,” in Proceedings of the 2012 ACM international conference on Intelligent

User Interfaces, IUI ’12, (New York, NY, USA), pp. 179–188, ACM, 2012.

[3] Twitter, “Twitter blog. “#numbers,” march 14, 2011 at http://blog.twitter.com/,”

Mar. 2011.

[4] K. Canini, B. Suh, and P. Pirolli, “Finding credible information sources in social

networks based on content and social structure,” in Privacy, security, risk and trust

(passat), 2011 ieee third international conference on and 2011 ieee third interna-

tional conference on social computing (socialcom), pp. 1 –8, oct. 2011.

[5] C. Castillo, M. Mendoza, and B. Poblete, “Information credibility on twitter,” in

Proceedings of the 20th international conference on World wide web, WWW ’11,

(New York, NY, USA), pp. 675–684, ACM, 2011.

[6] L. Yang, T. Sun, M. Zhang, and Q. Mei, “We know what you #tag: does the dual

role affect hashtag adoption?,” in Proceedings of the 21st international conference

on World Wide Web, WWW ’12, (New York, NY, USA), pp. 261–270, ACM, 2012.

[7] D. Artz and Y. Gil, “A survey of trust in computer science and the semantic web,”

Web Semant., vol. 5, pp. 58–71, June 2007.

[8] D. Olmedilla, O. F. Rana, B. Matthews, and W. Nejdl, “Security and trust issues

in semantic grids,” in In Proceedings of the Dagstuhl Seminar, Semantic Grid:

The Convergence of Technologies, Volume 05271. 2005. [PD05] [PPI04] Panteli,

pp. 191–200.

iii

iv BIBLIOGRAPHY

[9] B. J. Fogg, J. Marshall, O. Laraki, A. Osipovich, C. Varma, N. Fang, J. Paul,

A. Rangnekar, J. Shon, P. Swani, and M. Treinen, “What makes web sites credible?:

a report on a large quantitative study,” in Proceedings of the SIGCHI conference on

Human factors in computing systems, CHI ’01, (New York, NY, USA), pp. 61–68,

ACM, 2001.

[10] K. McNally, M. P. O’Mahony, B. Smyth, M. Coyle, and P. Briggs, “Towards a

reputation-based model of social web search,” in Proceedings of the 15th interna-

tional conference on Intelligent user interfaces, IUI ’10, (New York, NY, USA),

pp. 179–188, ACM, 2010.

[11] D. Houser and J. C. Wooders, “Reputation in Auctions: Theory, and Evidence

from eBay,” Journal of Economics and Management Strategy, Vol. 15, pp. 353-

369, Summer 2006.

[12] P. Resnick and R. Zeckhauser, “Trust among strangers in Internet transactions:

Empirical analysis of eBay’s reputation system,” in The Economics of the Internet

and E-Commerce (M. R. Baye, ed.), vol. 11 of Advances in Applied Microeconomics,

pp. 127–157, Elsevier Science, 2002.

[13] I. de Sola Pool and M. Kochen, “Contacts and influence,” Social Networks, vol. 1,

no. 1, pp. 5–51, 1978.

[14] S. Milgram, “The small world problem,” Psychology Today, vol. 1, pp. 61–67, May

1967.

[15] J. Golbeck, Computing with Social Trust. Human-Computer Interaction Series,

Springer, 2008.

[16] H. Zhao, W. Kallander, T. Gbedema, H. Johnson, and F. Wu, “Read what you

trust: An open wiki model enhanced by social context,” in Privacy, security, risk

and trust (passat), 2011 ieee third international conference on and 2011 ieee third

international conference on social computing (socialcom), pp. 370 –379, oct. 2011.

[17] J. O’Donovan, B. Smyth, V. Evrim, and D. McLeod, “Extracting and visualizing

trust relationships from online auction feedback comments,” in Proceedings of the

20th international joint conference on Artifical intelligence, IJCAI’07, (San Fran-

cisco, CA, USA), pp. 2826–2831, Morgan Kaufmann Publishers Inc., 2007.

BIBLIOGRAPHY v

[18] J. Golbeck, C. Robles, M. Edmondson, and K. Turner, “Predicting personality from

twitter,” in Privacy, security, risk and trust (passat), 2011 ieee third international

conference on and 2011 ieee third international conference on social computing

(socialcom), pp. 149 –156, oct. 2011.

[19] S. Adali, R. Escriva, M. K. Goldberg, M. Hayvanovych, M. Magdon-ismail, B. K.

Szymanski, W. A. Wallace, and G. T. Williams, “Measuring behavioral trust in

social networks.”

[20] J. Golbeck and D. Hansen, “Computing political preference among twitter follow-

ers,” in Proceedings of the 2011 annual conference on Human factors in computing

systems, CHI ’11, (New York, NY, USA), pp. 1105–1108, ACM, 2011.

[21] H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a social network or a

news media?,” in WWW ’10: Proceedings of the 19th international conference on

World wide web, (New York, NY, USA), pp. 591–600, ACM, 2010.

[22] M. Naaman, J. Boase, and C.-H. Lai, “Is it really about me?: message content in

social awareness streams,” in Proceedings of the 2010 ACM conference on Computer

supported cooperative work, CSCW ’10, (New York, NY, USA), pp. 189–192, ACM,

2010.

[23] B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury, “Twitter power: Tweets as

electronic word of mouth,” J. Am. Soc. Inf. Sci. Technol., vol. 60, pp. 2169–2188,

Nov. 2009.

[24] M. R. Morris, S. Counts, A. Roseway, A. Hoff, and J. Schwarz, “Tweeting is believ-

ing?: understanding microblog credibility perceptions,” in Proceedings of the ACM

2012 conference on Computer Supported Cooperative Work, CSCW ’12, (New York,

NY, USA), pp. 441–450, ACM, 2012.

[25] B. Suh, L. Hong, P. Pirolli, and E. H. Chi, “Want to be retweeted? large scale

analytics on factors impacting retweet in twitter network,” in Proceedings of the

2010 IEEE Second International Conference on Social Computing, SOCIALCOM

’10, (Washington, DC, USA), pp. 177–184, IEEE Computer Society, 2010.

[26] D. Zarrella, “The science of retweets, http://danzarrella.com/the-science-of-

retweets-report.html, http://danzarrella.com/science-of-retweets.pdf,” Sept. 2009.

vi BIBLIOGRAPHY

[27] S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts, “Who says what to whom

on twitter,” in Proceedings of the 20th international conference on World wide web,

WWW ’11, (New York, NY, USA), pp. 705–714, ACM, 2011.

[28] D. Zhao and M. B. Rosson, “How and why people twitter: the role that micro-

blogging plays in informal communication at work,” in Proceedings of the ACM

2009 international conference on Supporting group work, GROUP ’09, (New York,

NY, USA), pp. 243–252, ACM, 2009.

[29] G. Convertino, S. Kairam, L. Hong, B. Suh, and E. H. Chi, “Designing a cross-

channel information management tool for workers in enterprise task forces,” in

Proceedings of the International Conference on Advanced Visual Interfaces, AVI

’10, (New York, NY, USA), pp. 103–110, ACM, 2010.

[30] A. Java, X. Song, T. Finin, and B. Tseng, “Why we twitter: understand-

ing microblogging usage and communities,” in Proceedings of the 9th WebKDD

and 1st SNA-KDD 2007 workshop on Web mining and social network analysis,

WebKDD/SNA-KDD ’07, (New York, NY, USA), pp. 56–65, ACM, 2007.

[31] J. Caverlee and L. Liu, “Countering web spam with credibility-based link analy-

sis,” in Proceedings of the twenty-sixth annual ACM symposium on Principles of

distributed computing, PODC ’07, (New York, NY, USA), pp. 157–166, ACM, 2007.

[32] U. Sridhar and S. Mandyam, “Information sources driving social influences: A new

model for belief learning in social networks,” Social Network Analysis and Mining,

International Conference on Advances in, vol. 0, pp. 321–327, 2011.

[33] W. Wei, J. Lee, and I. King, “Measuring credibility of users in an e-learning envi-

ronment,” in Proceedings of the 16th international conference on World Wide Web,


[34] Y. Suzuki and A. Nadamoto, “Credibility assessment using wikipedia for messages

on social network services,” in Proceedings of the 2011 IEEE Ninth International

Conference on Dependable, Autonomic and Secure Computing, DASC ’11, (Wash-

ington, DC, USA), pp. 887–894, IEEE Computer Society, 2011.

[35] E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne, “Finding high-

quality content in social media,” in Proceedings of the international conference on

BIBLIOGRAPHY vii

Web search and web data mining, WSDM ’08, (New York, NY, USA), pp. 183–194,

ACM, 2008.

[36] A. Juffinger, M. Granitzer, and E. Lex, “Blog credibility ranking by exploiting

verified content,” in Proceedings of the 3rd workshop on Information credibility on

the web, WICOW ’09, (New York, NY, USA), pp. 51–58, ACM, 2009.

[37] G. Lotan, E. Graeff, M. Ananny, D. Gaffney, I. Pearce, and D. Boyd, “The revo-

lutions were Tweeted: Information flows during the 2011 Tunisian and Egyptian

revolutions,” International Journal of Communication, vol. 5, 2011.

[38] A. Gupta and P. Kumaraguru, “Credibility ranking of tweets during high impact

events,” in Proceedings of the 1st Workshop on Privacy and Security in Online

Social Media, PSOSM ’12, (New York, NY, USA), pp. 2:2–2:8, ACM, 2012.

[39] M. Mendoza, B. Poblete, and C. Castillo, “Twitter under crisis: can we trust what

we rt?,” in Proceedings of the First Workshop on Social Media Analytics, SOMA


[40] A. Iyengar, T. Finin, and A. Joshi, “Content-based prediction of temporal bound-

aries for events in Twitter,” in Proceedings of the Third IEEE International Con-

ference on Social Computing, IEEE Computer Society, October 2011.

[41] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes twitter users: real-

time event detection by social sensors,” in Proceedings of the 19th international

conference on World wide web, WWW ’10, (New York, NY, USA), pp. 851–860,

ACM, 2010.

[42] S. Westman and L. Freund, “Information interaction in 140 characters or less:

genres on twitter,” in Proceedings of the third symposium on Information interaction

in context, IIiX ’10, (New York, NY, USA), pp. 323–328, ACM, 2010.

[43] A.-M. Popescu and A. Jain, “Understanding the functions of business accounts on

twitter,” in Proceedings of the 20th international conference companion on World

wide web, WWW ’11, (New York, NY, USA), pp. 107–108, ACM, 2011.

[44] R. K. Garrett, “Troubling consequences of online political rumoring,” Human Com-

munication Research, vol. 37, no. 2, pp. 255–274, 2011.

viii BIBLIOGRAPHY

[45] D. Gayo-Avello, “”i wanted to predict elections,” CoRR, vol. abs/1204.6441, 2012.

[46] P. Melville, R. J. Mooney, and R. Nagarajan, “Content-boosted collaborative filter-

ing for improved recommendations,” in Eighteenth national conference on Artificial

intelligence, (Menlo Park, CA, USA), pp. 187–192, American Association for Arti-

ficial Intelligence, 2002.

[47] J. L. Herlocker, J. A. Konstan, and J. Riedl, “Explaining collaborative filtering

recommendations,” in Proceedings of the 2000 ACM conference on Computer sup-

ported cooperative work, CSCW ’00, (New York, NY, USA), pp. 241–250, ACM,

2000.

[48] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, “Grouplens: an

open architecture for collaborative filtering of netnews,” in Proceedings of the 1994

ACM conference on Computer supported cooperative work, CSCW ’94, (New York,

NY, USA), pp. 175–186, ACM, 1994.

[49] A.-L. Barabasi, “The origin of bursts and heavy tails in human dynamics,” Nature,

vol. 435, p. 207, 2005.

[50] S. Adali, F. Sisenda, and M. Magdon-Ismail, “Actions speak as loud as words:

predicting relationships from social behavior data,” in WWW, pp. 689–698, 2012.

[51] S. Y. Rieh and D. R. Danielson, “Credibility: A multidisciplinary framework,”

Annual Review of Information Science and Technology, vol. 41, no. 1, pp. 307–364,

2007.

[52] K. Ericsson, The Cambridge Handbook of Expertise And Expert Performance. Cam-

bridge Handbooks in Psychology, Cambridge University Press, 2006.

[53] M. Cha, H. Haddadi, F. Benevenuto, and K. Gummadi, “Measuring user influence

in twitter: The million follower fallacy,” 4th International AAAI Conference on

Weblogs and Social Media (ICWSM), 2010.

[54] Y. Duan, L. Jiang, T. Qin, M. Zhou, and H.-Y. Shum, “An empirical study on

learning to rank of tweets,” in COLING’10, pp. 295–303, 2010.

[55] J. Chen, R. Nairn, L. Nelson, M. Bernstein, and E. Chi, “Short and tweet: experi-

ments on recommending content from information streams,” in Proceedings of the

BIBLIOGRAPHY ix

28th international conference on Human factors in computing systems, CHI ’10,


[56] M. S. Bernstein, B. Suh, L. Hong, J. Chen, S. Kairam, and E. H. Chi, “Eddi:

interactive topic-based browsing of social status streams,” in Proceedings of the

23nd annual ACM symposium on User interface software and technology, UIST


[57] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” J. Mach.

Learn. Res., vol. 3, pp. 993–1022, Mar. 2003.

[58] X.-H. Phan, L.-M. Nguyen, and S. Horiguchi, “Learning to classify short and sparse

text & web with hidden topics from large-scale data collections,” in Proceedings of

the 17th international conference on World Wide Web, WWW ’08, (New York,

NY, USA), pp. 91–100, ACM, 2008.

[59] J. Weng, E.-P. Lim, J. Jiang, and Q. He, “Twitterrank: finding topic-sensitive

influential twitterers,” in Proceedings of the third ACM international conference on

Web search and data mining, WSDM ’10, (New York, NY, USA), pp. 261–270,

ACM, 2010.

[60] A. Ritter, C. Cherry, and B. Dolan, “Unsupervised modeling of twitter conver-

sations,” in Human Language Technologies: The 2010 Annual Conference of the

North American Chapter of the Association for Computational Linguistics, HLT

’10, (Stroudsburg, PA, USA), pp. 172–180, Association for Computational Linguis-

tics, 2010.

[61] D. Ramage, S. T. Dumais, and D. J. Liebling, “Characterizing microblogs with

topic models,” in ICWSM, 2010.

[62] D. Gayo-Avello, “Nepotistic relationships in twitter and their impact on rank pres-

tige algorithms,” CoRR, vol. abs/1004.0816, 2010.

[63] D. Talbot,“How google ranks tweets, http://www.technologyreview.com/web/24353/,”

Jan. 2010.

x BIBLIOGRAPHY

[64] A. Ghosh and P. McAfee, “Incentivizing high-quality user-generated content,” in

Proceedings of the 20th international conference on World wide web, WWW ’11,


[65] R. Xiang, J. Neville, and M. Rogati, “Modeling relationship strength in online social

networks,” in Proceedings of the 19th international conference on World wide web,


[66] X. Wang, F. Wei, X. Liu, M. Zhou, and M. Zhang, “Topic sentiment analysis in

twitter: a graph-based hashtag sentiment classification approach,” in Proceedings

of the 20th ACM international conference on Information and knowledge manage-

ment, CIKM ’11, (New York, NY, USA), pp. 1031–1040, ACM, 2011.

[67] A. Bifet and E. Frank, “Sentiment knowledge discovery in twitter streaming data,”

in Proceedings of the 13th international conference on Discovery science, DS’10,

(Berlin, Heidelberg), pp. 1–15, Springer-Verlag, 2010.

[68] O. Phelan, K. McCarthy, M. Bennett, and B. Smyth, “Terms of a feather: Content-

based news recommendation and discovery using twitter,” in Clough et al. [73],

pp. 448–459.

[69] G. Jeh and J. Widom, “Simrank: a measure of structural-context similarity,” in

Proceedings of the eighth ACM SIGKDD international conference on Knowledge

discovery and data mining, KDD ’02, (New York, NY, USA), pp. 538–543, ACM,

2002.

[70] S. Vargas and P. Castells, “Rank and relevance in novelty and diversity metrics

for recommender systems,” in Proceedings of the 2011 ACM Conference on Recom-

mender Systems, RecSys 2011, Chicago, IL, USA, October 23-27, 2011, pp. 109–

116, 2011.

[71] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong, “Diversifying search results,”

in Proceedings of the Second ACM International Conference on Web Search and

Data Mining, WSDM ’09, (New York, NY, USA), pp. 5–14, ACM, 2009.

[72] J. Hannon, K. McCarthy, and B. Smyth, “Finding useful users on twitter: Twit-

tomender the followee recommender,” in Clough et al. [73], pp. 784–787.

BIBLIOGRAPHY xi

[73] P. Clough, C. Foley, C. Gurrin, G. J. F. Jones, W. Kraaij, H. Lee, and V. Mur-

dock, eds., Advances in Information Retrieval - 33rd European Conference on IR

Research, ECIR 2011, Dublin, Ireland, April 18-21, 2011. Proceedings, vol. 6611 of

Lecture Notes in Computer Science, Springer, 2011.

[74] K. Puniyani, J. Eisenstein, S. Cohen, and E. P. Xing, “Social links from latent

topics in microblogs,” in Proceedings of the NAACL HLT 2010 Workshop on Com-

putational Linguistics in a World of Social Media, WSA ’10, (Stroudsburg, PA,

USA), pp. 19–20, Association for Computational Linguistics, 2010.

[75] D. Ramage, S. Dumais, and D. Liebling, “Characterizing microblogs with topic

models,” 2010.

[76] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” J. Mach.

Learn. Res., vol. 3, pp. 993–1022, Mar. 2003.

[77] O. Phelan, K. McCarthy, M. Bennett, and B. Smyth, “Terms of a feather: content-

based news recommendation and discovery using twitter,” in Proceedings of the

33rd European conference on Advances in information retrieval, ECIR’11, (Berlin,

Heidelberg), pp. 448–459, Springer-Verlag, 2011.

[78] B. Smyth, P. Briggs, M. Coyle, and M. O’Mahony, “Google shared. a case-study in

social search,” in Proceedings of the 17th International Conference on User Model-

ing, Adaptation, and Personalization: formerly UM and AH, UMAP ’09, (Berlin,

Heidelberg), pp. 283–294, Springer-Verlag, 2009.

[79] K. McNally, M. P. O’Mahony, B. Smyth, M. Coyle, and P. Briggs, “Social and

collaborative web search: an evaluation study,” in Proceedings of the 16th inter-

national conference on Intelligent user interfaces, IUI ’11, (New York, NY, USA),

pp. 387–390, ACM, 2011.

[80] J. O’Donovan, M. Schaal, B. Kang, T. Hollerer, and S. Barry, “A network-based

analysis of topic similarity in microblogs,” in Proceedings of the N-th international

conference on Recommendation Systems, RecSys ’12, (New York, NY, USA), pp. 0–

0, ACM, 2012.

xii BIBLIOGRAPHY

[81] D. H. McKnight and C. J. Kacmar, “Factors and effects of information credibility,”

in Proceedings of the ninth international conference on Electronic commerce, ICEC


[82] K. Tanaka, H. Ohshima, A. Jatowt, S. Nakamura, Y. Yamamoto, K. Sumiya,

R. Lee, D. Kitayama, T. Yumoto, Y. Kawai, J. Zhang, S. Nakajima, and Y. Inagaki,

“Evaluating credibility of web information,” in Proceedings of the 4th International

Conference on Uniquitous Information Management and Communication, ICUIMC

’10, (New York, NY, USA), pp. 23:1–23:10, ACM, 2010.

[83] K. Puniyani, J. Eisenstein, S. Cohen, and E. P. Xing, “Social links from latent

topics in microblogs,” in Proceedings of the NAACL HLT 2010 Workshop on Com-

putational Linguistics in a World of Social Media, WSA ’10, (Stroudsburg, PA,

USA), pp. 19–20, Association for Computational Linguistics, 2010.

[84] B. Gretarsson, J. O’Donovan, S. Bostandjiev, T. Hollerer, A. Asuncion, D. New-

man, and P. Smyth, “Topicnets: Visual analysis of large text corpora with topic

modeling,” ACM Trans. Intell. Syst. Technol., vol. 3, pp. 23:1–23:26, Feb. 2012.

master project reportan analysis of credibility in...

Documents