competence modeling in twitter: mapping theory to practiceholl/pubs/odonovan-2014-socialcom.pdf ·...

Competence Modeling in Twitter: Mapping Theory to Practice

John O’DonovanDept. of Computer Science,

University of California, SantaBarbara

Email: [email protected]

Byungkyu KangDept. of Computer Science,



Tobias HollererDept. of Computer Science,



ABSTRACT

Availability of “big data” from the Social Web pro-vides a unique opportunity for synergy between thecomputational and social sciences. On one hand,psychologists and social scientists have developedand established models of human competence, cred-ibility, trust and skill over many years. Currently,much research is being conducted by computer sci-entists to evaluate these human-behavioral aspectsusing real-world data from Twitter and other sources.However, many of these algorithms are formulatedin an ad-hoc way, without much reference to estab-lished theory from the existing literature. This paperpresents a framework for mapping existing modelsof human competence and skill onto a real worldstreaming data from a social network. An exam-ple mapping is described using the Dreyfus modelof skill acquisition, and an analysis and discussionof resulting feature distributions is presented on fourtopic-specific data collections from Twitter, includingone on the 2014 Winter Olympics in Sochi, Russia.The mapping is evaluated using human assessmentsof competence through a crowd sourced study of 150participants.

I INTRODUCTION

At the beginning of 2014 the Twitter social networkgenerated over 9,000 new messages every second, [1].The volume and geographic diversity of these mes-sages easily establish Twitter as a major informationchannel for news, media and conversation. It is a wellknown fact that where user-generated content exists,there is always a large amount of noisy or otherwiseuseless data. A key challenge to harnessing Twitteras a information source, is the ability to find rele-vant, reliable and trustworthy users to follow. Com-puter scientists in the fields of search and informa-tion retrieval (e.g.: recommender systems) have at-tempted to address this problem in other domains forseveral decades [2], while Behavioral scientists (Psy-chologists, Cognitive scientists, Social scientists) havestudied the concepts of trust, reliability and compe-tence for a far longer period of time, and have de-

veloped established theory for identifying and classi-fying these characteristics, both at the human leveland the information level. [3,4] While many studies ofTwitter in the computer science literature attempt tomodel and mine for these characteristics [5–7], theirmodels and algorithms tend to be formulated in anad-hoc manner, without strong grounding in estab-lished theory from the human behavioral sciences.

This paper describes an experimental framework tomap and validate established models of human be-havior with the Twitter network and the informationthat flows within it. If applied successfully, such aframework has three clear benefits. First, it can serveas a form of validation for existing theoretical mod-els by applying them at scales that were previouslyunattainable. Second, it can help analysts to con-structively reason about observed phenomenon in thereal world data. Finally, it can be used to improve de-sign of search and recommendation applications thatattempt to relieve the information overload problem.

Mapping of complex theoretical models of human be-havior to observed behaviors in Twitter is clearly nota trivial task. The examples shown in the follow-ing sections all require a level of interpretation anda common sense reasoning about the links betweenfactors in the model, and features and indicators inthe Twitter information network. For the purpose ofgeneralization we highlight the following steps for in-tegrating an arbitrary human behavioral model withthe network and associated data from Twitter API,and follow this with an example implementation ofthe general process.

• Task Identification and Analysis What are theinformation requirements? What data elementsfrom Twitter API can provide insight?

• Model Selection Is there a model in the behav-ioral/social science literature that is relevant tothe task?

• Feature Selection What are the best features inthe social network that may be useful indicatorsto the model?

2014 ASE BIGDATA/SOCIALCOM/CYBERSECURITY Conference, Stanford University, May 27-31, 2014

©ASE 2014 ISBN: 978-1-62561-000-3 1

• Interpretation and Mapping How should thefeatures be related to the model itself

• Model Building and Validation Train a predic-tion model using the mapped feature set andvalidate against a test set of annotations, orother available ground truth data.

In the remainder of this paper we detail the abovemapping procedure using an example task and anestablished theoretical model over four large currentevent data sets crawled from Twitter. Since identifi-cation of reliable information is such a critical aspectof today’s social web, we have chosen the following asan example task: can we predict that a Twitter userwill provide information about a target topic in a com-petent way. Since Twitter is still a relatively youngplatform, and many users are still unfamiliar with thefull scope of its operation and use, we have borroweda model of competence from educational psychologyknown as the “Dreyfus Model of Skill Acquisition” [3]as a working example that to our knowledge has notpreviously been applied to social web data.

II RELATED WORK

In this section, we introduce the state-of-the-art tech-niques from literature that identify unique featuresfor social media analytics and building models to pre-dict various facets of human behavior. Twitter hasa unique combination of text content and underlyingsocial link structure, in addition to a variety of dy-namic or ad-hoc structures, making it ideal for thestudy of information credibility and competence ofan information provider. Common methods for datamining in Twitter can be loosely classified by the typeof data that they operate on.

• Content-based Methods generally rely on thetext and other metadata in a message to makeassertions about information or users. For ex-ample, trust, credibility, competence of the au-thor etc. These methods can be quite scalable,since they require only a single API query perassertion. Examples include Canini et al. [8]Kang et al. [9] and Castillo et al. [10]

• Network-based Methods generally rely on analy-sis of the underlying network structure to makedecisions about information quality. Exam-ples include Zamal et al. [11]. Network basedmethods can be slower and less scalable sincethey potentially require many API queries to

make assertions about a single user or mes-sage. Dynamic network analysis methods, suchas retweet analysis can be even more computa-tionally expensive and less scalable, since theyfocus on information flowing through an alreadycomplex network.

• Hybrid Methods combine facets from contentand network-based approaches. Examples in-clude Sikdar et al. [12], O’Donovan et al. [7]and Kang et al. [9].

Canini et al. [8] present a good example of content-based analysis of messages in Twitter, they concen-trate on modeling topic-specific credibility, defininga ranking strategy for users based on their relevanceand expertise within a target topic, using LatentDirichlet Analysis. Based on user evaluations theyconclude that there is “a great potential for automat-ically identifying and ranking credible users for anygiven topic”. Canini et al. also evaluate the e↵ect ofcontext variance on perceived credibility.

Twitter has been studied extensively from a mediaperspective as a news distribution mechanism, bothfor regular news and for emergency situations such asnatural disasters, and other high-impact situations[5, 13, 14]. For example, Thomson et al. [14] modelthe credibility of di↵erent tweet sources during theFukushima Daiichi nuclear disaster in Japan. Theyfound that proximity to the crisis seemed to moder-ate an increased tendency to share information fromhighly credible sources, which is further evidence forour earlier argument that credibility models in Twit-ter need to account for and adapt to changes in con-text. Castillo et. al. [5] describe a study of informa-tion credibility, with a particular focus on news con-tent, which they define as a statistically mined topicbased on word co-occurrence from crawled “bursts”(short peaks in tweeting about specific topics). Theydefine a complex set of features over messages, top-ics, propagation and users, which trained a classifierthat predicted at the 70-80% level for precision/recallagainst manually labeled credibility data. While thethree models presented in this paper di↵er, our eval-uation mechanism is similar to that in [5], and weadd a brief comparison of findings in our result anal-ysis. Mendoza et. al [13] also evaluate trust in newsdissemination on Twitter, focusing on the Chileanearthquake of 2010. They statistically evaluate datafrom the emergency situation and show that rumorscan be successfully detected using aggregate analysisof Tweets.


©ASE 2014 ISBN: 978-1-62561-000-3 2

Attribute Feature Examplegender language use

(stylistic features:pronouns, deter-miners, preposi-tions, quantifiers,conjunctions, etc.)

traditional text [15,16],blog [17], email [18],user searchquery [19, 20], re-view [21], Twit-ter [11, 22], Face-book [23]

messagelocation

message/web con-tent, search query,

[19,24,25]

regionalorigin

message text, userbehavior, networkstructure

[22]

profileage

search query, pro-file description

[11,19,22]

politicalorienta-tion

message text [11,22,26]

Table 1: Common demographic attributes used inTwitter mining algorithms.

While identification of indicators of human-behavioral features such as competence and credi-bility is an important task , it is also important toconsider the end-user’s perception of them. Morriset al. [27] performed a study to address users per-ceptions of the credibility of individual tweets in avariety of contexts, for example, from socially con-nected and unconnected sources, e.g., in blogs [17],email [18] and search [19,20]. From the results, Mor-ris et al. derive a set of design recommendations forthe visual representation of social search results.

Demographics play an important role in understand-ing information quality in Twitter. Table 1 presentsan overview of key user-based attributes that re-searchers tend to rely on. In this table, attributesare shown on the left, example features for each areshown in the middle column, and the research papersthat employ the features/attributes are given in theright column. For example, [28] conducted a simplesurvey on the application of features which can beused for analyzing people’s profiles on the style, pat-terns and content of their communication streams.Herring [15] investigate the language/gender/genrerelationship in web blogs and show gender-relatedstylistic features from diary and filter entries. In-corporating occurrence of words and special charac-ters based on pre-defined corpora is another type offeature selection. For example, [29] use simple nom-inal or binary binary features to classify tweets intodi↵erent categories such as news, temporal events,opinions, deals or conversations. [24] propose a prob-abilistic framework for content-based location estima-tion using microblog messages. The framework esti-mates each user’s city-level location based purely on

the message text without any geospatial coordinates,while [22] apply stacked-SVM-based classification al-gorithms for their classification task on a Twitterdataset. Since we are interested in creating mappingsbetween existing models of human behavior and theTwitter network, understanding these di↵erent fea-tures, methods and their performances is a criticalfirst-step.

III SETUP AND DATA COLLECTION

In this section, we will describe the experimentalsetup for our evaluation, particularly the crawlingprocess and the collected data. Table 2 shows a sum-mary of all data used in our evaluation, and Figure1 shows an overview of the crawling process for usersand topics. The larger circle denotes a set of messagesgathered during a retroactive crawl using keywordsthat emerged after a period of time had elapsed sincethe initial crawl, but were still deemed to be a partof the core topic.

Figure 1: Overview of the crawled set of users andtopics. Set Sseed represents the initial seed crawl froma key hashtag. Set S represents an expanded topiccrawl to incorporate additional hashtags that evolveover the course of the event. Set u represents the setof all tweets from users who exist in S

1 DATA COLLECTION

To allow for comparison of feature and model behav-ior, three di↵erent data sets are used in our evalua-tion. The first data collection is centered around the2014 winter olympic games in Sochi, Russia. Datawas crawled for approximately three weeks using avariety of keywords shown in Table 2. Sochi was cho-sen as a potentially interesting data set because ofthe diversity of cultures involved, and because of theassociated excitement, politics and availability of con-crete ground truth data in the form of event results.


©ASE 2014 ISBN: 978-1-62561-000-3 3

Collection # Users # Msgs Keywords Hashtags Example tweetboston 357,152 460,945 marathon, pray, suspect, victims,

bomb, police, hit, shrapnel, doctor,pellet, running, die, a↵ected, rip, ex-plosion, swat, blood, bombings, fbi,tragedy, donate, watertown, arrest,kill, injured, runner, hurt, donors,dead, identified

#bostonmarathon,#prayfor-boston, #boston,#prayersfor-boston, #water-town, #bruins

RT @Channel4News:There have been noarrests made yet afterthe bombings at the#BostonMarathon -US sources. #c4news

bostonstrong 62,461 120,442 a↵ected, bostonisback, boston-strong, boylston, charitymiles,donate, fbi, flyers, fund, help,honor, hope, marathon, memorial,oneboston, onefundboston, police,silence, spell, strength, strong,support, donors, tribute, victims,blood, bomb, doctor, tragedy, dead,rip, pray, hurt

#bostonstrong,#oneboston,#copley, #boston-isback, #prayfor-boston

@Nicolette O Thankyou for your sup-port of the original#BOSTONSTRONGcampaign, Nicolette!Nearing $400K raisedfor The One FundBoston! xxx

sochi 4,305,508 9,521,089 sochi, olympic, winter, female-olympians, games, gold, team, rus-sia, hockey, medal, opening, usa,athletes, figure, canada, win, men’s,ceremony, skating, ice, stray, putin,women’s, gay, sport, won, ski, live,slope, skater, world

#sochi,#olympics,#sochi2014,#sochiproblems,#wearewinter,#sougofollow,#olympics2014

RT @Bobby Brown1:In air shot on the#Olympic slopecourse. Jumps arehuge. Gonna be funhttp:t.coXCQz90k1Eb

Table 2: Overview of three data collections used to evaluate the mapping framework.

Figure 2: Overview of the Dreyfus model of skill acquisition. A component mental function is represented oneach row and associated skill levels are shown on the columns. The horizontal arrows on each row representthe change in an observed mental function that facilitates an increase in the skill level represented in themodel.

Our second and third data sets are related to the ter-rorist attack that occurred during the 2013 BostonMarathon. The larger of the two collections was col-lected about the event itself, using the popular hash-tag “#boston”. In this case, the data crawling beganan hour after the event occurred and continued fortwo weeks. The second data collection was about theaftermath and recovery movement, crawled using thekeyword “#bostonstrong” This was also crawled forapproximately two weeks.

2 THEORETICAL FOUNDATION

To exemplify the mapping process, we have chosento borrow a model from the field of educational psy-chology known as “the Dreyfus model of skill acqui-sition” [3]. Since Twitter is a relatively new phe-nomenon, many of its users are still learning aboutthe complex information, information flow, and net-work structure that Twitter supports, so we deemedthis competence-based model of skill acquisition tobe a reasonable example. Ideally, the generalizableframework we are describing will support many otherestablished models of credibility, competence, trust or


©ASE 2014 ISBN: 978-1-62561-000-3 4

Function Non-competentState

ComptentState

Corresponding fea-tures

Other possibilitiesfor features

Description

Recollection Non-situational

Situational S[(u, t0)]� avg(S) specific #ht $non-specific. writ-ing of content

Adaptation to con-text (time specific)

Recognition Decomposed Holistic Fraction of T thatis in u

– Coverage of topicT by user u

Decision Analytical Intuitive Opin(u, T )/UOpin – Opinion and Senti-ment of u in T

Awareness Monitoring Absorbed Fraction of u thatis in t

– Involvement/Immersonin a topic T

Table 3: Interpreted mappings between the Dreyfus model and a set of Twitter features

other factors that influence human decision-making,provided that appropriate mapping steps can be per-formed.

2.1 DREYFUS MODEL OF SKILL ACQUI-SITION

The Dreyfus model of skill acquisition describes theprocess of human skill acquisition in 5 di↵erent lev-els. This model was first introduced by the brothersStuart and Hubert Dreyfus [3], and is established inthe fields of education and operations research. Themodel is based on the four di↵erent transitions thatdefine boundaries between five binary states of men-tal function during human learning. The originalmodel, as can be seen in Table 3, is based on thethree scenarios that show progression of a througheach of the transitions, respectively. Table 3 suggestsone of many possible mappings to a set of observablefeatures in the Twitter based on expert interpretationof both.

3 MAPPING

Now that we have selected a model, the next stepis to study the meaning of each component withinit, and formulate a reasonable analog in the behav-ior of an available set of Twitter features. A discus-sion of all such features is not possible here. Thefeature sets that we consider are discussed in Sik-dar et. al [12], especially in Tables I and II of [12].First it is necessary to define the network, topic, usersand associated features more concretely: For the fol-lowing discussion, we view the Twitter domain as atriple (S,U, T ), where S = (s1, s2, ...sn) is a set oftweets crawled about a target topic. U is the set ofusers (u1, u2, ...un) who have at least one tweet inS. Additionally we define T a vector of event times-tamps representing when messages in S were posted.This is given by T = (t1, t2, ...tn). Furthermore, each

topic S can be represented by its component hash-tags, Shash = (h1, h2, ...hn). A notable property ofShash is that the vector emerges over the values inT . Last, we define Sseed as the subset of S, gatheredfrom the earliest emergent hashtags in Shash.

Importantly, the mapping procedure we discuss hereis simply an example to demonstrate the process.Mappings between a complex network and a complexbehavioral model obviously require a degree of man-ual interpretation. Figure 2 illustrates a general formof the Dreyfus model, highlighting four key mentalfunctions and the related competence levels. Table3 shows the mental function on the leftmost column,followed by the associated indicators of competenceor non-competence. The third row is the critical com-ponent, showing the analog feature combinations inTwitter. This is followed by other notable analogsand a text description of each feature. Our approachfirst looks at behavioral features in Twitter that couldpotentially serve as an indicator of each state. Firstwe will describe the reasoning behind each mapping,and in the following section we present an evaluationof the behavior of each mapped feature, further indi-cating its potential to measure competence.

To recap, we are interested in evaluating the compe-tence of information providers in Twitter with respectto a target topic. This covers both authorship and in-formation propagation alike. Within this context, weinterpret recollection in a topic as the ability to thinkback into the topic history, in the sense of maximizingones posterity in the target topic. To approach thiscomputationally, we consider the sequence of eventtimes T of topic S from our earlier definition, andattempt to gauge where individual users reside withrespect to the normal for the topic. For example,if Alice’s history goes farther back than Bob’s, shehas a greater degree of posterity, and perhaps thiscan be an indicator of competence. We compute thisfor every user simply as the earliest timestamp of atweet that they have made in topic S. This is com-


©ASE 2014 ISBN: 978-1-62561-000-3 5

pared against the average timestamp of all users’ firsttweets (s0 within the topic, as shown in Equation 1below. In a perfect mapping, we could simply ex-amine the distribution graph of this feature over allusers and segment it using a threshold value to deter-mine the boundary between the competent and non-competent state. In this case, the boundary betweennon-situational (general) and situational (specific, de-tailed) recollection. The following section describesevaluations of this type for all features on all threedata collections.

recollection(u, S) = T [(s0, u)]�

nPi=1

T [(s0, ui)]

n(1)

The next function of the Dreyfus model in Table 3is “recognition”. Assessing whether a human’s recog-nition of a topic is in a decomposed or holistic statecan be very di�cult, depending on the complexityof the topic being analyzed. For our simple compu-tational model, we treat recognition of a topic S byuser u as the degree of coverage of S by u. This couldbe simply computed as the sum of all messages in uthat are related to S, divided by the total number ofmessages in S. However, sparsity, irrelevant messagesand other noise in the topic can weaken the link tothe user profile. A better way to approach this map-ping could leverage a) the set of hashtags in Shash

that describe the topic, or b) the set most frequentlyoccurring terms as a more well-defined descriptor ofthe topic. We compute the hashtag-based coverageas Equation 3 below.

recognition(u, S) =Shash(u)

Shash(all)(2)

The “decision” function in the Dreyfus model istreated di↵erently in our mapping. Dreyfus catego-rizes this into analytical decision-making and intu-itive decision-making, with the latter being an in-dicator of expertise within the topic (see Figure 2).Deciding whether an individual is making analyticalor intuitive choices has been the subject of many re-search papers in itself, e.g. [30], so again, we will needto simplify here for the purposes of discussion. Ourcomputational model looks to sentiment as an indi-cator of decision making potential. This approachhas been studied and validated by many researchers,For example, O’Connor et al [31] found that decisionsto purchase products (consumer confidence) and de-cisions about elections [31, 32] can be predicted by

examining frequency of sentiment-related word usagein Twitter posts.

In particular, we examine three aspects of sentiment:

• Degree of Subjectivity If a user demonstratesthe ability to form subjective opinion on a giventopic, it *may* point towards a higher level ofcompetence. To assess this, we borrow a sub-jectivity lexicon from the Opinion Finder tooldescribed by Wilson et al. in [33]. Each user uis represented as a bag of terms and a count isperformed for terms that occur in the lexicon.The resulting value is our subjectivity score forthat user. At a finer grained level, we focuson words that imply personal preference (e.g.cool, excellent, awesome, etc.), and on expres-sions / idioms that imply opinion (e.g. I think,I suppose, I believe etc.).

• Sentiment Intensity Intensity of sentiment is agood indicator of knowledge about a topic [31].In our model, this is measured as a simple countagainst the sentiment lexicon from NLTK [34].

• Sentiment Polarity Our third sentiment met-ric examines sentiment of user u as a polar-ized scalar sp = [�1 1] by comparison againstnegative and positive sentiment lexicons fromNLTK.

While the Dreyfus model from Figure 2 shows a singlefactor for “Decision”, we choose to analyze the threesentiment factors separately in the analysis that fol-lows, in case varying behaviors can be observed. Af-ter the initial feature behavior analysis they can bepruned or combined in some way to produce a singleattribute.

The final function listed in Table 3 is the conceptof awareness. According to the model shown in Fig-ure 2, when a human’s awareness transitions frompersistent monitoring to an absorbed level, it is anindication of mastery of a particular skill. Put an-other way, this transition occurs when actions become“second nature” instead of as a result of careful fine-grained analysis of rules and inputs. Again, this isa potentially di�cult concept to map onto a simplecomputational model, since one essentially needs tobe at the mastery level in a given topic to recognizesuch intuitive actions. In this example, our goal isto evaluate competence of an information providerin a target topic. As a simple proxy for detectingthe transition in awareness between monitoring and


©ASE 2014 ISBN: 978-1-62561-000-3 6

absorbed, our computational model focuses on thedegree of immersion of a user in a topic. That is, thepercentage of the user u’s profile that is dedicated toa topic S. One problem with this proxy is that is doesnot facilitate fair comparison between users –a prop-erty that is required for the feature behavior analysisthat follows. Consider our Sochi Olympics datasetfor example: If the o�cial winter olympic feed has1,000 tweets all about the event, and a random user(Joe) has 10 tweets that are also about the event,this metric would produce the same score for bothprofiles. To control for this, we introduce a weight wbased on the number of tweets in the profile, shownhere as Equation 3:

awareness(u, S) =u(Shash)

u(all)⇥ w. (3)

This concludes the interpretation and mapping phaseof the framework. Now, we arrive at a computationalmodel in the form of a set of observable features thatmaps, albeit loosely, to the theoretical model in Fig-ure 2. The next step in the procedure is to evalu-ate the behavior of these features to determine dis-tribution curves and see if we can identify reasonablethresholds that can correspond with the phase tran-sitions of the Dreyfus model, shown in Figure 2.

4 FEATURE ANALYSIS

Now that we have described the computational modelwe must assess its potential to predict human behav-ior in real world Twitter data. To achieve this wecompute the 6 individual features described in theprevious section on each of the three data collections(Boston, BostonStrong and Sochi). All of the featuresdescribed can be considered user-based features, thatis, they are attached to a single user, as opposed to asingle message (see [7, 9, 12] for a discussion on userand message-based features). In order to examine po-tential of a feature for predicting competence of a useras a provider of information about a topic, we take thefollowing approach: First we compute the individualfeature value f 2 F for each user u 2 U on each dataset S. Next we plot a distribution dist(f, U, S) forall features in F and all three of our topics. Resultsof this analysis are shown in Figure 3, and arrangedas follows: each row represents a computed feature,identified by the title on the left side. Each columnrepresents a data collection, identified by the seedhashtag in the header row. This arrangement of dis-tributions is useful since allows us to quickly compare

across data collections and across features. All val-ues are shown in percentages with the exception ofthe first row, which is a time-based value (seconds).

Let us first discuss the behavior of individual features,with a view to locating thresholds that may yield in-formation about competence of users as informationproviders about the topic. The recollection featureshows distribution of users as a deviation from themean time that the topic was discussed on Twitter,meaning that the leftmost group are early adopters,those at the peak are discussing the event as it is hap-pening, or close to it in time, while the users to theright are talking about it after-the-fact. The userson the right of the peaks have the important benefitof hindsight. Note that for the Sochi data set, thegaussian curve is cut o↵ because the data runs up tothe time of writing of this article. Table 2 shows thecrawl times for each plot. Both Sochi and Boston-Strong data sets show clusters of early adopters onthe negative slope –an interesting subset for furtheranalysis.

For the recognition/coverage feature all three collec-tions show clusters of accounts with relatively highcoverage. Manual inspection of these showed thatthey were o�cial, government, media or other dedi-cated accounts to monitor the event during the crawl-ing time, and are therefore a potentially useful infor-mation source. The decision feature shows the mostinteresting result across the three collections. Clearlythere is a large amount of sentiment and opinion ex-pressed about the Boston and BostonStrong collec-tions, and the dedicated account clusters are clearlyvisible on the right. Looking at the sentiment po-larity shows a more detailed account of the publicfeeling at the time. During the event time, the sen-timent was all negative relating to the bombing inci-dent, but when we look at the polarity score for theaftermath movement BostonStrong, we see clear signsof positive sentiment relating to the topic. Theseare likely tributes and other encouraging, hopefulmessages stemming from the tragic event. For theolympics data, there is a more even distribution,which is intuitive given the winners and losers at thegames.

Last, the awareness metric examined the immersionof a user in a topic, but weighted the score based onthe number of tweets in T. These plots (bottom rowof Figure 3 show a few accounts that are far morededicated than the others. These accounts are again,likely to be dedicated to covering the topic for onereason or another.


©ASE 2014 ISBN: 978-1-62561-000-3 7

Figure 3: Analysis of behavior for the mapped feature set (Dreyfus model representation). Each row repre-sents an individual feature, and each column represents a data set. The “decision” feature has been brokeninto three sub-features: opinion, sentiment intensity and sentiment polarity, shown on rows 3-5. All valuesare shown in percentages with the exception of the first row, which is a time-based value (seconds).

In summary, the best values for thresholding thesegraphs to best identify the transitions from Figure 2are likely to be in the areas that segment small clus-ters from the remainder of the users. The followingsection outlines an experiment to evaluate the com-petence of users that exist within the extremities ofeach of the feature distribution plots from Figure 3.

Figure 4: Procedure for sampling user profiles fromeach of the 6 feature distribution graphs for evalua-tion in the crowd sourced experiment.

IV EVALUATION

Thus far have described a mapping process betweenan abstract behavioral model from the field of edu-cational psychology, and a measurable set of featuresin the Twitter network. We have performed an anal-ysis of the behavior of each individual feature. Thenext step in our general framework is to evaluate datasamples from the distributions in an e↵ort to find use-ful thresholds for building a prediction model. Figure4 illustrates the process on a sample distribution. mmessages were sampled from n users from the extrem-ities of each distribution plot. In this experiment,we chose m = 2 and n = 3 for each of the 6 fea-tures on each of the Sochi data collection and gaugedperceived levels of competence, newsworthiness andtopic-relevance in a crowd-sourced study.


©ASE 2014 ISBN: 978-1-62561-000-3 8

1 FEATURE-BASED COMPETENCE AS-SESSMENT

Figure 5: Distribution of ratings in AMT study forCompetence, Newsworthiness and Relevance on theSochi Winter Olympics data collection.

Figure 6: Comparison of ratings for each featuregrouped by the users sampled from COMP+ andCOMP� areas of the feature distribution curves.This graph was computed on the Sochi data collec-tion.

A study was run using Amazon’s Mechanical Turkcrowdsourcing tool. In total, 150 participants com-pleted the study. Participants were 62% Male, 38%Female, ranged in age from 18 to 58 and took anaverage of 12 minutes to complete the study. Mostparticipants reported that they had strong readingability and had at least a Bachelor level college ed-ucation. A small payment of 50 cents was providedfor completed studies. Sampled messages were pre-sented to AMT evaluators in a simple web form. Par-ticipants were asked to read groups of three messages

(coming from an individual user), and evaluate thatuser’s competence as an information provider in thetarget topic. Competence ratings were provided onthe 5-point Dreyfus Scale from Novice to Expert. Inaddition to competence, newsworthiness and topic-relevance was also assessed. Table 4 lists all of themetrics that were recorded in the study. Here we fo-cus only on the competence annotations (COMP+and COMP�). Figure 5 shows the mean compe-tence score (y-axis) on the Sochi data set for eachfeature in our mapped model (x-axis). The x-axis isgrouped by COMP+ and COMP�, reflecting theusers and messages sampled from the right and leftsides of each feature distribution curve in Figure 3and also illustrated in Fig 4.

Figure 3 shows some interesting results for each fea-ture. The only instance where COMP+ is lower thanCOMP� is on the recollection feature. In otherwords, the users selected from the left side of thisfeature distribution, i.e. the early adopters of thetopic, received higher competence scores than thosewho began tweeting about the topic later in its evo-lution. This is a good indication that recollection isa useful feature for measuring competence in Twit-ter. The second group in Figure 3 (recognition) showsus that those users who covered a greater portion ofthe topic were considered to be more credible. Thelargest di↵erence between competence ratings is forthe opinionatedness feature. Here we can see thatusers in COMP+ (right side of distribution curve,and highly opinionated) were rated as more com-petent than those in the COMP� group (left sideof distribution, less opinionated), with a relative in-crease of 35.5%. The smallest di↵erence was shownfor the sentiment polarity group (12% relative in-crease for COMP+ group), meaning that polarityof sentiment was less correlated with the competenceannotations than intensity of sentiment, coverage ofa topic or opinionatedness.

Figure 5 shows the general distribution of the rat-ings from the study, for each of the metrics in Table4. This trend was evident across all data sets andfeatures evaluated in the study, with mean ratingsbetween 3 and 4 on the 5 point rating scale. Figure7 shows a di↵erent perspective on the AMT data.Here, we focus on the trend in the di↵erence betweenCOMP+ and COMP� across the rating bins fromnovice to expert. The upper chart shows the di↵er-ences for the recollection feature. This tells us thatthere are far more early adopters of the topic in theproficient and expert bins than in the the novice andbeginner bins. Interestingly, this was a significant


©ASE 2014 ISBN: 978-1-62561-000-3 9

trend for the competence annotations, but not forthe newsworthiness annotations. The lower chart inFigure 7 shows the opposite trend for the opinionat-edness feature: more highly opinionated users existin the proficient and expert bins than the beginnerand novice bins. These trends show that opinion andadoption-time (time of first tweet about the topic)are strong indicators of competence, but less so ofnewsworthiness.

Figure 7: Di↵erences between AMT competence rat-ings for the Recollection and Opinionatedness fea-tures. Di↵erences shown for COMP+,COMP�,and NEWS+, NEWS�. The x-axis shows each rat-ing bin from novice to expert.

Series Description

COMP+ Competence score for tweets onright side of feature distribution

COMP� Competence score for tweets onleft side of feature distribution

NEWS+ Newsworthiness score for tweetson right side of feature distribu-tion

NEWS� Newsworthiness score for tweetson left side of feature distribution

REL+ Relevance score for tweets onright side of feature distribution

REL� Relevance score for tweets on leftside of feature distribution

Table 4: Description of recorded results from AMTstudy.

V CONCLUSIONS AND FUTURE WORK

This paper has presented a step-by-step generalizableframework for linking existing models of human be-havior from the social and cognitive sciences with realworld measurable features from the Twitter socialnetwork. Specifically the research proposed 5 integra-tion steps and provided a worked example using theDreyfus model of skill acquisition as a representativemodel. Features were mapped to a computationalmodel over the Twitter network and behavior of eachfeature was analyzed over three large data collections.A study of 150 participants evaluated the competencelevels of users sourced from both poles of the fea-ture distributions. Results and manual analysis indi-cate that there is potential in the distribution plotsto identify useful (competent) information sources re-lated to a particular topic. A feature-by-feature com-parison outlined a range of interesting e↵ects betweencompetence ratings for users selected from the polesof the feature distribution plots for the Sochi datacollection. As a follow up study the authors proposeto compare against a range of other models from thebehavioral sciences, and to combine the resultant fea-tures into a predictive model and run accuracy-basedevaluations over multiple ground-truth metrics. Inconclusion, while there are many assumptions in themapping stages of the approach, the authors believethat the methodology can help both algorithm de-signers for the social web and researchers in the be-havioral sciences to better understand complex datainteractions in Twitter.

Acknowledgments. This work was partially sup-ported by the U.S. Army Research Laboratory un-der Cooperative Agreement No. W911NF-09-2-0053The views and conclusions contained in this docu-ment are those of the authors and should not be in-terpreted as representing the o�cial policies, eitherexpressed or implied, of ARL or the U.S. Govern-ment. The U.S. Government is authorized to repro-duce and distribute reprints for Government purposesnotwithstanding any copyright notation here on.

References

[1] “Twitter statistics.” [Online]. Available: http://www.statisticbrain.com/twitter-statistics/

[2] O. Phelan, K. McCarthy, and B. Smyth, “Usingtwitter to recommend real-time topical news,”in Proceedings of the third ACM conference onRecommender systems. ACM, 2009, pp. 385–


©ASE 2014 ISBN: 978-1-62561-000-3 10

http://www.statisticbrain.com/twitter-statistics/

http://www.statisticbrain.com/twitter-statistics/

388.

[3] S. E. Dreyfus and H. L. Dreyfus, “A five-stagemodel of the mental activities involved in di-rected skill acquisition,” DTIC Document, Tech.Rep., 1980.

[4] J. C. McCroskey, “Scales for the measurementof ethos,” 1966.

[5] C. Castillo, M. Mendoza, and B. Poblete, “Infor-mation credibility on twitter,” in WWW, 2011,pp. 675–684.

[6] S. K. Sikdar, B. Kang, J. O’Donovan,T. Hollerer, and S. Adal, “Cutting through thenoise: Defining ground truth in informationcredibility on twitter,” HUMAN, vol. 2, no. 3,pp. pp–151, 2013.

[7] J. O’Donovan, B. Kang, G. Meyer, T. Hollerer,and S. Adalii, “Credibility in context: An anal-ysis of feature distributions in twitter,” in Pri-vacy, Security, Risk and Trust (PASSAT), 2012International Conference on and 2012 Interna-tional Confernece on Social Computing (Social-Com). IEEE, 2012, pp. 293–301.

[8] K. R. Canini, B. Suh, and P. L. Pirolli, “Findingcredible information sources in social networksbased on content and social structure,” in Pro-ceedings of the third IEEE International Confer-ence on Social Computing (SocialCom), 2011.

[9] B. Kang, J. O’Donovan, and T. Hollerer, “Mod-eling topic specific credibility on twitter,” in Pro-ceedings of the 2012 ACM international confer-ence on Intelligent User Interfaces. ACM, 2012,pp. 179–188.

[10] C. Castillo, M. Mendoza, and B. Poblete, “Infor-mation credibility on twitter,” in Proceedings ofthe 20th international conference on World wideweb. ACM, 2011, pp. 675–684.

[11] F. Al Zamal, W. Liu, and D. Ruths, “Homophilyand latent attribute inference: Inferring latentattributes of twitter users from neighbors.” inICWSM, 2012.

[12] S. Sikdar, B. Kang, J. O’Donovan, T. Hollerer,and S. Adali, “Understanding information cred-ibility on twitter,” in Social Computing (So-cialCom), 2013 International Conference on.IEEE, 2013, pp. 19–24.

[13] M. Mendoza, B. Poblete, and C. Castillo, “Twit-ter Under Crisis: Can we trust what we RT?” in

1st Workshop on Social Media Analytics (SOMA’10). ACM Press, Jul. 2010. [Online]. Avail-able: http://chato.cl/papers/mendoza pobletecastillo 2010 twitter terremoto.pdf

[14] R. Thomson, N. Ito, H. Suda, F. Lin, Y. Liu,R. Hayasaka, R. Isochi, and Z. Wang, “Trust-ing tweets: The fukushima disaster and informa-tion source credibility on twitter,” in Proceedingsof the 9th International ISCRAM Conference,2012.

[15] S. C. Herring and J. C. Paolillo, “Gender andgenre variation in weblogs,” Journal of Sociolin-guistics, vol. 10, no. 4, pp. 439–459, 2006.

[16] S. Singh, “A pilot study on gender di↵er-ences in conversational speech on lexical richnessmeasures,” Literary and Linguistic Computing,vol. 16, no. 3, pp. 251–264, 2001.

[17] J. D. Burger and J. C. Henderson, “An explo-ration of observable features related to bloggerage.” in AAAI Spring Symposium: Computa-tional Approaches to Analyzing Weblogs, 2006,pp. 15–20.

[18] N. Garera and D. Yarowsky, “Modeling latent bi-ographic attributes in conversational genres,” inProceedings of the Joint Conference of the 47thAnnual Meeting of the ACL and the 4th Inter-national Joint Conference on Natural LanguageProcessing of the AFNLP: Volume 2-Volume 2.Association for Computational Linguistics, 2009,pp. 710–718.

[19] R. Jones, R. Kumar, B. Pang, and A. Tomkins,“I know what you did last summer: query logsand user privacy,” in Proceedings of the sixteenthACM conference on Conference on informationand knowledge management. ACM, 2007, pp.909–914.

[20] I. Weber and C. Castillo, “The demographics ofweb search,” in Proceedings of the 33rd interna-tional ACM SIGIR conference on Research anddevelopment in information retrieval. ACM,2010, pp. 523–530.

[21] J. Otterbacher, “Inferring gender of movie re-viewers: exploiting writing style, content andmetadata,” in Proceedings of the 19th ACMinternational conference on Information andknowledge management. ACM, 2010, pp. 369–378.


©ASE 2014 ISBN: 978-1-62561-000-3 11

http://chato.cl/papers/mendoza_poblete_castillo_2010_twitter_terremoto.pdf

http://chato.cl/papers/mendoza_poblete_castillo_2010_twitter_terremoto.pdf

[22] D. Rao, D. Yarowsky, A. Shreevats, andM. Gupta, “Classifying latent user attributes intwitter,” in Proceedings of the 2nd internationalworkshop on Search and mining user-generatedcontents. ACM, 2010, pp. 37–44.

[23] D. Rao, M. J. Paul, C. Fink, D. Yarowsky,T. Oates, and G. Coppersmith, “Hierarchicalbayesian models for latent attribute detection insocial media.” in ICWSM, 2011.

[24] Z. Cheng, J. Caverlee, and K. Lee, “You arewhere you tweet: a content-based approach togeo-locating twitter users,” in Proceedings of the19th ACM international conference on Informa-tion and knowledge management. ACM, 2010,pp. 759–768.

[25] C. Fink, C. D. Piatko, J. Mayfield, T. Finin, andJ. Martineau, “Geolocating blogs from their tex-tual content.” in AAAI Spring Symposium: So-cial Semantic Web: Where Web 2.0 Meets Web3.0, 2009, pp. 25–26.

[26] M. Thomas, B. Pang, and L. Lee, “Get outthe vote: Determining support or oppositionfrom congressional floor-debate transcripts,” inProceedings of the 2006 conference on empiricalmethods in natural language processing. Asso-ciation for Computational Linguistics, 2006, pp.327–335.

[27] M. R. Morris, S. Counts, A. Roseway, A. Ho↵,and J. Schwarz, “Tweeting is believing?: un-derstanding microblog credibility perceptions,”in Proceedings of the ACM 2012 conferenceon Computer Supported Cooperative Work,ser. CSCW ’12. New York, NY, USA:ACM, 2012, pp. 441–450. [Online]. Available:http://doi.acm.org/10.1145/2145204.2145274

[28] M. Pennacchiotti and A.-M. Popescu, “A ma-chine learning approach to twitter user classifi-cation.” in ICWSM, 2011.

[29] B. Sriram, D. Fuhry, E. Demir, H. Ferhatosman-oglu, and M. Demirbas, “Short text classificationin twitter to improve information filtering,” inProceedings of the 33rd international ACM SI-GIR conference on Research and development ininformation retrieval. ACM, 2010, pp. 841–842.

[30] S. Epstein, R. Pacini, V. Denes-Raj,and H. Heier, “Individual di↵erences inintuitive-experiential and analytical-rationalthinking styles.” J Pers Soc Psychol,vol. 71, no. 2, pp. 390–405, 1996. [Online].Available: http://www.biomedsearch.com/nih/Individual-di↵erences-in-intuitive-experiential/8765488.html

[31] M. O’Connor, D. Cosley, J. A. Konstan, andJ. Riedl, “Polylens: a recommender system forgroups of users,” in ECSCW 2001. Springer,2001, pp. 199–218.

[32] P. T. Metaxas and E. Mustafaraj, “From obscu-rity to prominence in minutes: Political speechand real-time search,” in Web Science Confer-ence, 2010.

[33] T. Wilson, P. Ho↵mann, S. Somasundaran,J. Kessler, J. Wiebe, Y. Choi, C. Cardie,E. Rilo↵, and S. Patwardhan, “Opinionfinder: Asystem for subjectivity analysis,” in Proceedingsof HLT/EMNLP on Interactive Demonstrations.Association for Computational Linguistics, 2005,pp. 34–35.

[34] “Natural language toolkit, available at:http://nltk.org,” Jan. 2014. [Online]. Avail-able: http://nltk.org


©ASE 2014 ISBN: 978-1-62561-000-3 12

http://doi.acm.org/10.1145/2145204.2145274

http://www.biomedsearch.com/nih/Individual-differences-in-intuitive-experiential/8765488.html



http://nltk.org

competence modeling in twitter: mapping theory to practiceholl/pubs/odonovan-2014-socialcom.pdf ·...

Documents