using structured events to predict stock price movement

Exploiting Social Relations for Sentiment Analysisin Microblogging

Xia Hu, Lei Tang, Jiliang Tang, Huan LiuArizona State University, Tempe, AZ 85287, USA@WalmartLabs, San Bruno, CA 94066, USA

{xiahu, jiliang.tang, huan.liu}@asu.edu, [email protected]

ABSTRACTMicroblogging, like Twitter1, has become a popular plat-form of human expressions, through which users can easilyproduce content on breaking news, public events, or prod-ucts. The massive amount of microblogging data is a usefuland timely source that carries mass sentiment and opinionson various topics. Existing sentiment analysis approachesoften assume that texts are independent and identically dis-tributed (i.i.d.), usually focusing on building a sophisticatedfeature space to handle noisy and short messages, withouttaking advantage of the fact that the microblogs are net-worked data. Inspired by the social sciences findings thatsentiment consistency and emotional contagion are observedin social networks, we investigate whether social relationscan help sentiment analysis by proposing a Sociological Ap-proach to handling Noisy and short Texts (SANT ) for senti-ment classification. In particular, we present a mathemati-cal optimization formulation that incorporates the sentimentconsistency and emotional contagion theories into the super-vised learning process; and utilize sparse learning to tacklenoisy texts in microblogging. An empirical study of tworeal-world Twitter datasets shows the superior performanceof our framework in handling noisy and short tweets.

Categories and Subject DescriptorsH.3.3 [Information Storage and Retrieval]: Informa-tion Search and RetrievalClassification; I.2.7 [ArtificialIntelligence]: Natural Language Processing

General TermsAlgorithm, Performance, Experimentation

KeywordsSentiment Classification, Microblogging, Twitter, Noisy andShort Texts, Social Context, Social Correlation

1https://twitter.com/

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.WSDM 13, February 48, 2013, Rome, Italy.Copyright 2013 ACM 978-1-4503-1869-3/13/02 ...$10.00.

1. INTRODUCTIONMicroblogging services are extensively used to share infor-

mation or opinions in various domains. With the growingavailability of such an opinion-rich resource, it attracts muchattention from those who seek to understand the opinions ofindividuals, or to gauge aggregated sentiment of mass pop-ulations. For example, advertisers may want to target userswho are enthusiastic about a brand or a product in orderto launch a successful social media campaign. Aid agen-cies from around the world would like to monitor sentimentevolutions before, during, and after crisis to assist recoveryand provide disaster relief. The sheer volumes of microblog-ging data present opportunities and challenges for sentimentanalysis of these noisy and short texts.Sentiment analysis has been extensively studied for prod-

uct and movie reviews [32], which differ substantially frommicroblogging data. Unlike standard texts with many wordsthat help gather sufficient statistics, the texts in microblog-ging only consist of a few phrases or 1-2 sentences. Also,when composing a microblogging message, users may use orcoin new abbreviations or acronyms that seldom appear inconventional text documents. For example, messages likeIt is coooooool, OMG :-(, are intuitive and popular inmicroblogging, but some are not formal words. It is diffi-cult for machines to accurately identify the semantic mean-ings of these messages, though they provide convenience inquick and instant communications for human beings. Exist-ing methods [2, 8] rely on pre-defined sentiment vocabular-ies [39], which are highly domain-specific.Meanwhile, microblogging platforms often provide addi-

tional information other than text. For example, in Figure 1,we depict two kinds of data available in microblogging. Fig-ure 1(a) shows the content of messages, in the form of amessage-feature matrix. Traditional methods measure thesimilarity between text documents (messages) purely basedon content information. A distinct feature of microbloggingmessages is that they are potentially networked through userconnections, which may contain useful semantic clues thatare not available in purely text-based methods. Besides con-tent information, relations between messages can be repre-sented via a user-message matrix and a user-user interactionmatrix, as shown in Figure 1(b). Traditional methods, if ap-plied directly to the microblogging data, do not utilize thesocial relation information.In social sciences, it is well-established that emotions and

sentiments play a distinct role in our social life and correlatewith our social connections. When experiencing emotions,people do not generally keep the emotions to themselves,

!

!

"

#

(a) Data Representation of Message Content

!

"#

!

"#

(b) Data Representation of Social Relations

Figure 1: Data Representation of Text and SocialRelation Information in Microblogging

but rather, they tend to show them [19]. Also, people tendto catch others emotions as a consequence of facial, vocal,and postural feedback, which has been recognized as emo-tional contagion [10] in social sciences. Emotional contagionmay be important in personal relationships because it fos-ters behavioral synchrony and the tracking of the feelings ofothers moment-to-moment even when individuals are not ex-plicitly attending to this information[10]. As a consequenceof emotional contagion, Fowler and Christakis [6] reportedthe spread of happiness in a social network. Two social pro-cesses, selection and influence, are proposed to explain thephenomenon [22]: people befriend others who are similar tothem (Homophily [26]), or they become more similar to theirfriends over time (Social Influence [25]). Both explanationssuggest that connected individuals are more likely to havesimilar behaviors or hold similar opinions. Inspired by thissociological observation, we explore the utilization of socialrelation information to facilitate sentiment analysis in thecontext of microblogging.In this paper, we aim to provide a supervised approach

to sentiment analysis in microblogging by taking advantageof social relation information in tackling the noisy nature ofthe messages. In particular, we first investigate whether thesocial theories exist in microblogging data. Then we discusshow the social relations could be modeled and utilized forsupervised sentiment analysis. Finally, we conduct exten-

sive experiments to verify the proposed model. The maincontributions of this paper are as follows:

We formally define the problem of sentiment analy-sis in microblogging to enable the utilization of socialrelations for sentiment analysis;

By verifying the existence of two social theories in mi-croblogging, we build sentiment relations between mes-sages via social relations;

We present a novel supervised method to tackle thenoisy and short texts by integrating sentiment rela-tions between the texts; and

We empirically evaluate the proposed SANT frame-work on real-world Twitter datasets and elaborate theeffects of social relationships on sentiment analysis.

The remainder of this paper is organized as follows. InSection 2, we review related work. In Section 3, we formallydefine the problem we study. In Section 4, we conduct astudy to verify the social theories. In Section 5, we proposea novel framework SANT for supervised sentiment analysis.In Section 7, we report empirical results. We conclude andpresent the future work in Section 8.

2. RELATED WORKRecently, sentiment analysis on microblogging, which is

considered to be an opinion-rich resource, has gained hugepopularity and attracted researchers from many disciplines [3,18, 20, 30]. Bollen et al. [3] proposed to measure the senti-ments on Twitter over time, and compared the correlationbetween sentiments and major events, including the stockmarket, crude oil prices, elections and Thanksgiving. Also,Kim et al. [20] examined a tweet dataset about Michael Jack-sons death to gain insight into how emotion is expressed onTwitter. OConnor et al. [30] used sentiment analysis toautomatically label the sentiments of tweets about politi-cians, and found strong correlation between the aggregatedsentiment and the manually collected poll ratings.Sentiment classification has been studied for years on vari-

ous text corpus, like newspaper articles [33], movie reviews [31],and product reviews [5, 11, 23]. The basic idea of the meth-ods is to build a sophisticated feature space, which can ef-fectively represent the sentiment status of the texts. Exist-ing methods, which are designed for traditional i.i.d. textdata, cannot effectively make use of the abundant social re-lation information contained in microblogging. Followingthe methods for traditional texts, there are some existingefforts in the community on the microblogging data. Alec etal. [8] presented the results of machine learning algorithmsfor classifying the sentiments of Twitter messages using dis-tant supervision. Barbosa and Feng [2] explored the linguis-tic characteristics of how tweets are written and the meta-information of words for sentiment classification. The ideasof the methods are consistent with traditional ones, ignoringthe social relation information.Some efforts have been made to explore the effect of exter-

nal information sources [37, 41], especially social network in-formation [35, 36], on sentiment analysis. Speriosu et al. [35]proposed to incorporate labels from a maximum entropyclassifier, in combination with the Twitter follower graph.They simply used a users followers as separate features and

combined them with the content matrix. Tan et al. [36]proposed to improve user-level sentiment analysis of differ-ent topics [16] by incorporating social network information.However, our task is document-level sentiment classification,which has finer granularity than their work. In addition, ourmethod simultaneously utilizes social relation informationand handles noisy, short texts in microblogging.

3. PROBLEM STATEMENTThe problem we study in this paper is different from tradi-

tional sentiment classification since the latter normally onlyconsiders the content information. In this section, we firstpresent the notations and then formally define the problemof sentiment classification on microblogging messages.We use boldface uppercase letters (e.g., A) to denote ma-

trices, boldface lowercase letters (e.g., a) to denote vectors,and lowercase letters (e.g., a) to denote scalars. The entryat the ith row and jth column of a matrix A is denoted asAij . Ai and Aj denote the ith row and jth column ofa matrix A, respectively. A1 is the `1-norm and AFis the Frobenius norm of matrix A. Specifically, A1 =m

i=1

nj=1 |Aij | and AF =

mi=1

nj=1 |Aij |2.

Given a corpus T = [X,Y], where X Rmn is thecontent matrix, Y Rnc is the sentiment label matrix,m is the number of features, n is the number of messagesand c is number of sentiments, as shown in Figure 1(a).For each message in the corpus T = {t1, t2, . . . , tn}, ti =(xi,yi) Rm+c consists of microblogging message contentand sentiment label, where xi Rm is the message featurevector and yi Rc is the sentiment label vector. Follow-ing previous work [8, 35], in this paper, we focus on polar-ity sentiment classification, i.e., c = 2. It is practical toextend this setting to a multi-class sentiment classificationtask. u = {u1, u2, . . . , ud} is the user set, where d is thenumber of distinct users in the corpus. U Rdn is a user-message matrix, as shown as the left matrix in Figure 1(b).In the user-message matrix, Uij = 1 (red frame in the fig-ure) denotes that message tj is posted by user ui. F Rddis the user-user matrix, as shown in Figure 1(b). In the ma-trix, Fij = 1 (red frame in the figure) indicates that user uiis connected by user uj .With the notations above, we formally define sentiment

classification of microblogging messages as:Given a corpus of microblogging messages T with their

content X and corresponding sentiment labels Y, social rela-tions for this corpus including the user-message relation U,and user-user following relation F, we aim to learn a clas-sifier W to automatically assign sentiment labels for unseenmessages (i.e., test data).

4. DATA AND OBSERVATIONSBefore we proceed to our solution, in this section, we first

introduce real-world data used in this work and present someexplorations whether social theories have any impacts onsentiment analysis.

4.1 DataSubsets of two publicly available Twitter datasets are em-

ployed: Stanford Twitter Sentiment (STS) and Obama-McCainDebate (OMD). Both datasets consist of raw tweets withtheir corresponding sentiment labels.

Table 1: Statistics of the DatasetsSTS OMD

# of Tweets 22,262 1,827# of Users 8,467 735Max Degree of the Users 897 138Min Degree of the Users 1 1Avg. Tweets per User 2.63 2.49

The first dataset is the Stanford Twitter Sentiment (STS)2.Go et al. [8] created a collection of 40216 tweets with po-larity sentiment labels to train a sentiment classifier. How-ever, it lacks social network information among users in thisdataset. We further refined the Twitter dataset accordingto authors social relation information, which is the com-plete follower graph3 crawled by Kwak et al. [21] duringJuly 2009. According to the social network, we filter tweetswhose authors have no friends or have published fewer thantwo tweets. Finally, it leaves a corpus of 22,262 tweets thatconsists of 11959 positive tweets and 10303 negative ones.The second dataset is the Obama-McCain Debate (OMD)4.

This dataset consists of 3,269 tweets posted during the presi-dential debate on September 26, 2008 between Barack Obamaand John McCain [34]. The sentiment label of each tweetwas annotated through Amazon Mechanical Turk5. Eachtweet was manually labeled by at least three Turkers. Inour experiment of polarity sentiment classification, we usetweets with sentiment labels. The majority of votes for atweet is taken as a gold standard. In order to obtain thesocial relation information, we use the same social networkas in refining the STS dataset. All the tweets whose authorshave no friends or have published fewer than two tweets arefiltered. This results in a corpus of 1,827 tweets, in which747 have positive labels and 1080 have negative labels.The statistics of the datasets are summarized in Table 1.

4.2 Social Theories in MicrobloggingIn social sciences, Fowler and Christakis [6] found that

the spread of happiness appears to reach up to three de-grees of separation in a social network. Recently, researchersreported the phenomenon of sentiment diffusion [27] in on-line social networks based on the theory of emotional con-tagion [10] between friends. The analysis indicates that, interms of sentiment, social theories such as Sentiment Con-sistency [1] and Emotional Contagion [10] could be helpfulfor sentiment analysis. Sentiment Consistency suggests thatthe sentiments of two messages posted by the same user aremore likely to be consistent than those of two randomly se-lected messages. Emotional Contagion reveals that the sen-timents of two messages posted by friends are more likely tobe similar than those of two randomly selected messages.The two theories are derived from oine surveys and con-

versations. We would like to validate whether the two socialtheories hold true in microblogging data. For each of thetheories, we form a null hypothesis: in terms of sentiment,there is no difference between relational data and randomdata. We test the hypotheses on each of the two datasets.

2http://www.stanford.edu/~alecmgo/cs224n/3http://an.kaist.ac.kr/traces/WWW2010.html/4https://bitbucket.org/speriosu/updown/src/5de483437466/data/5https://www.mturk.com/

The sentiment difference score between two messages isdefined as Tij = ||yiyj ||2, where yi is the sentiment labelof message xi. To verify the existence of Sentiment Consis-tency, we construct two vectors sct and scr with equal num-ber of elements. Each element of the first vector sct is ob-tained by calculating the sentiment difference score betweenxi and xj , which are posted by the same user. Each elementin the vector corresponds to a pair of related messages. Theelement of the second vector represents the sentiment dif-ference score between xi and another random message xrin the corpus. We perform a two-sample t-test on the twovectors sct and scr. The null hypothesis is that there isno difference between the two vectors, H0 : sct = scr; thealternative hypothesis is that the sentiment difference be-tween messages with Sentiment Consistency relation is lessthan those without, H1 : sct < scr. Similarly, we constructanother two vectors ect and ecr, and perform a two-samplet-test on the two vectors for verifying Emotional Contagion.The null hypothesis is H0 : ect = ecr and the alternativehypothesis is H1 : ect < ecr. The t-test results, p-values,show that there is strong evidence (with the significance level = 0.01) to reject the null hypothesis in both tests on thetwo datasets. In other words, we observe the existence ofwhat Sentiment Consistency and Emotional Contagion sug-gest in microblogging data. This preliminary study pavesthe way for our next study: how to explicitly model andutilize these social theories for sentiment classification task.

5. A SOCIOLOGICAL APPROACH SANTIn this section, we first introduce data representation and

modeling for message content, and then discuss how wemodel relations between messages based on the social the-ories. Finally, we present how to employ sparse learning tohandle noisy and high-dimensional data.

5.1 Modeling Message ContentTo find a better text representation for sentiment analysis,

Pang and Lee [33] conducted experiments to investigate theeffectiveness of different features on sentiment classification.Their two major findings are (1) although different featureconstruction methods, like N-grams, Part of Speech, adjec-tives, sentiment vocabulary, have comparable performance,the unigram model with term presence (but not frequency)as feature weight achieves the best results; and (2) no stem-ming or stop-word lists are used because some of them maycarry sentiment information. Thus, we employ the unigrammodel to construct our feature space, use term presence asthe feature weight, and do not perform stemming or removestop-words. It is noted that our framework is not confinedto the unigram model. We can also use other text represen-tation methods for specific sentiment classification tasks.The widely used method Least Squares is employed to

fit the learned model to message content. In terms of multi-class classification problems, the Least Squares aims to learnc classifiers by solving the following optimization problem:

minW

1

2XTW Y2F , (1)

where W represents the learned classifiers. This formula-tion is a traditional supervised classification method, wherethe messages are assumed to be independent and identicallydistributed. This method has been well studied, and it hasa closed-form solution.

5.2 Modeling Message-Message RelationsIn this subsection, we introduce our formulation to utilize

social relations for sentiment analysis, and answer the ques-tion: How can social relations be explicitly integrated intothe sentiment classification framework?.In order to transform user-centric social relations into sen-

timent relations between messages, we employ the socialtheories discussed in Section 4.2. Given the user-messagematrix U and user-user matrix F, the message-message sen-timent relation matrix for Sentiment Consistency (Asc) isdefined as Asc = UT U, where Ascij = 1 indicates thatti and tj are posted by the same user, and sentiments of thetwo messages are similar. The message-message sentimentrelation matrix for Emotional Contagion (Aec) is definedas Aec = UT F U, where Aecij = 1 indicates thatthe author of ti is a friend of the author who wrote tj , andsentiments of the two messages are similar. The followingderivations in this paper are based on the message-messagesentiment relation matrix A, which can be obtained as ei-ther the sentiment relation Asc, Aec, or the combinationA = Asc + Aec, where controls the weight of two dif-ferent sentiment relations in the model. In this paper, wefocus on studying the effects of different sentiment relationson the sentiment classification performance, but not ways tocombine them. We can simply combine these two relationswith equal weight = 1 to construct a relation matrix.Based on the discussion above, to integrate sentiment rela-

tions between messages in sentiment classification, the basicidea is to build a latent connection to make two messages asclose as possible if they are posted by the same user (Sen-timent Consistency) or two users are follower/friend witheach other (Emotional Contagion). Under this scenario, itcan be mathematically formulated as solving the followingobjective function.

1

2

ni=1

nj=1

AijYi Yj2

=

ck=1

YTk(DA)Yk

= tr(WTXLXTW), (2)

where tr() is the trace of a matrix. Y = XTW is thefitted value of the sentiment label Y. L = D A is theLaplacian matrix [4], where A Rnn is a message-messagesentiment relation matrix to represent a direct graph. Aij =1 indicates that message ti has relation with message tj ,and Aij = 0 otherwise. D Rnn is a diagonal matrixwith Dii =

nj=1Aij indicating its diagonal element is the

degree of a message in relation matrix A.As Laplacian matrix L is positive semi-definite, Eq. (2)

can be rewritten as:

tr(WTXLXTW)= WTXL 12 2F ,

(3)

For different message-message relations, we use differentsentiment relation matrices A to obtain the Laplacian ma-trices L. The optimization formulation, which integratessentiment relations into the learning process, is defined as:

minW

1

2||XTW Y||2F +

2WTXL 12 2F , (4)

where is the regularization parameter to control the con-tribution of sentiment relation information.

5.3 Handling the Noisy & Short Texts A Sparse Formulation

Compared with texts in traditional media, another dis-tinct feature of texts in microblogging is that they are noisyand short [12, 14, 38], which leads to two problems. First,text representation models, like bag of words or n-gram,often lead to a high-dimension feature space because of thelarge-scale size of the dataset and vocabulary. Second, theshort and noisy texts make the data representation verysparse. This high-dimension sparse representation poses sig-nificant challenges to building an interpretable model withhigh prediction accuracy.To retain original information in the texts, we do not fil-

ter the terms according to any domain-specific sentimentvocabularies. When people speed-read through a text, theymay not fully parse the sentence but instead seek a sparserepresentation for the incoming text using a few phrases orwords [13]. Thus, we propose to provide a sparse reconstruc-tion for the classification feature space. Recently, sparseregularization has been widely used in many data miningapplications to obtain more stable and interpretable mod-els. A natural approach for our problem is the lasso [7], thepenalization of the `1-norm of the estimator. The `1-normbased linear reconstruction error minimization can lead toa sparse representation for the texts, which is robust to thenoise in features. The multi-class classifier can be learnedby solving the following optimization problem.

minW

1

2XTW Y2F + W1, (5)

where is the sparse regularization parameter. In the objec-tive function, the first term is least squares loss. The secondterm is `1-norm regularization on weight matrix W, whichcauses some of the coefficients to be exactly zero. Thus thelasso does a kind of continuous subset selection and also con-trols the complexity of the model. Further, we introduce theLaplacian regularization discussed in Section 5.2 to Eq. (5).The sentiment classification of microblogging data can beformulated as the following optimization problem.

minW

1

2||XTW Y||2F +

2WTXL 12 2F

+ ||W||1,(6)

where and are postive regularization parameters. Bysolving Eq. (6), the sentiment label of each message can bepredicted by

arg maxi{p,n}

xTwi. (7)

Next, we introduce an efficient algorithm to solve the op-timization problem in Eq. (6).

6. ALGORITHMIC DETAILSSince W1 is non-differentiable, the proposed objective

function in Eq. (6) is non-smooth. In this section, we intro-duce an efficient algorithm to solve the optimization prob-lem, and discuss its convergence rate and time complexity.

6.1 Optimization Algorithm for SANTMotivated by [24, 29, 9], we propose to solve the non-

smooth optimization problem in Eq. (6) by optimizing itsequivalent smooth convex reformulations.

Theorem 1. Eq. (6) can be reformulated as a constrainedsmooth convex optimization problem:

minWZ

f(W) =1

2XTW Y2F +

2WTXL 12 2F , (8)

where,

Z = {W| W1 z}, (9)z 0 is the radius of the `1-ball, and there is a one-to-onecorrespondence between and z.

Remark 1. The relationship between and z is not crit-ical because the optimal values of both are unknown. Thetwo parameters are usually tuned using cross-validation. Asufficiently small z will cause some of the coefficients to beexactly zero, thus it also does a kind of continuous subsetselection.

Proof. The Hessian matrix of the objective function inEq. (8) is positive semi-definite, thus the objective functionf(W) is convex and differentiable. It is easy to verify thatW1 is a valid norm because it satisfies the three normconditions, including the triangle inequality A1+ B1 A+B1. Since any norm defines a convex set, Z is a closedand convex set.As we can see, our problem defines a convex and differ-

entiable function f(W) in a closed and convex set Z. Thusthis problem is a constrained smooth convex optimizationproblem, which completes the proof. 2We first consider the optimization problem in Eq. (8) with-

out the constraint part W Z, and it is defined as:minW

f(W). (10)

It is known that, in gradient descent method, Wt+1 is up-dated in each step as:

Wt+1 =Wt 1tf(Wt) (11)

where t is the step size, which is determined by the linesearch according to the Armijo-Goldstein rule. The smoothpart of the optimization problem can be reformulated equiv-alently as a proximal regularization [17] of the linearizedfunction f(W) at Wt, which is formally defined as:

Wt+1 = argminW

Gt,Wt(W), (12)

where,

Gt,Wt(W) = f(Wt) + f(Wt),W Wt+t2W Wt2F . (13)

Considering the equivalence relationship and the constraintsZ, we propose to solve Eq. (8) through the following iterativestep,

Wt+1 = arg minWZ

Gt,Wt(W), (14)

By ignoring terms that are independent ofW in Eq. (13),the objective function in Eq. (14) boils down to:

Wt+1 = arg minWZ

W Ut2F , (15)

where Ut = Wt 1tf(Wt) and actually the solution ofW is the Euclidean projection of Ut on Z. f(Wt) is thegradient of f(Wt), and in our paper f(Wt) is defined as:

f(Wt) = XXTWt XY + XLXTWt. (16)Eq. (14) can be decomposed into n subproblems as:

wjt+1 = arg minwjZj

wj ujt22, (17)

where ujt , wj and wjt are the j-th rows of Ut, W and Wt,

respectively. Zj is defined on wj1. Given , the Euclideanprojection has a closed form solution as follows,

wjt+1 =

{(1

tujt)ujt if ujt t

0 otherwise(18)

The above method has the convergence ratio of 1. As dis-

cussed in [24], our constrained smooth convex optimizationproblem can be further accelerated to achieve the optimalconvergence 1

. In particular, this accelerated algorithm

is based on two sequences Wt and Vt in which Wt is thesequence of approximate solutions, and Vt is the sequenceof search points, which is an affine combination of Wt andWt1 as:

Vt =Wt + t(Wt Wt1), (19)where t is the combination coefficient. The approximate so-lutionWt+1 is computed as a gradient step of Vt throughGt,Vt . Then the detailed algorithm about SANT with thisaccelerated optimization solution is shown in Algorithm 1.In the algorithm, we use Nesterovs method [29] to solve

the optimization problem in Eq. (6) from line 5 to 22. Itis the line search algorithm for t according to the Armijo-Goldstein rule from line 8 to 15. In line 20, t is set accordingto [24]. Based on the algorithm, we can have the solution tothe convex optimization problem, and obtain the sentimentclass label by Eq. (7).

6.2 Convergence and Complexity AnalysisIn this subsection, we discuss the convergence rate and

time complexity of the proposed Algorithm 1.The convergence rate of Algorithm 1 is elaborated in the

following theorem.

Theorem 2. [17] Assume that {Wt} is the sequence ob-tained by Algorithm 1, then for any t we have,

f(Wt+1) f(W?) 2LfW? W12F

(t+ 1)2, (20)

where Lf = max(2Lf , L0), L0 is an initial guess of the lip-schitz continuous gradient Lf of f(W) and W

? is the solu-tion of f(W).

Proof. The detailed proof of the above theorem can befound in [17]. This theorem shows that the convergence rateof Algorithm 1 is O( 1

) where is the desired accuracy. 2

Based on the theorem, the algorithm will converge in 1

iterations, now we discuss time complexity of the proposed

Algorithm 1: SANT : Sentiment Analysis for NoisyTexts with Social Relations

Input: {X,Y,U,F,W0, 1, , }Output: W

1: Initialize 0 = 0, 1 = 1, W1 =W0, t = 12: Asc = UT U, Aec = UT FU3: A = Asc +Aec4: Construct Laplacian matrix L from A5: while Not convergent do

6: Set Vt =Wt +t11

t(Wt Wt1)

7: Set f(Wt) = XXTWt XY + XLXTWt8: loop9: Set Ut = Vt 1tf(Wt)10: Compute Wt+1 according to Eq. (18)11: if f(Wt+1) Gt,Vt(Wt+1) then12: t+1 = t, break13: end if14: t = 2 t15: end loop16: W =Wt+117: if stopping criteria satisfied then18: break19: end if20: Set t+1 =

1+1+4t2

21: Set t = t+ 122: end while23: W =Wt+1

method SANT at each iteration. Given a collection of nmessages with the feature space m, the objective functionEq. (6) consists of three components. First, for the leastsquares loss function, it costs O(mn) floating point opera-tions for calculating the function value and gradient of theobjective function. Second, for the `1-norm regularizationpart, the time complexity is O(2n) based on the Euclideanprojection algorithm [24]. Third, the Laplacian regulariza-tion part, the time complexity is also O(mn). Therefore,we can solve the objective function in Eq. (6) with a timecomplexity of O( 1

(mn+ 2n+mn)) = O( 1

(mn)).

7. EXPERIMENTSIn this section, we present empirical evaluation results to

assess the effectiveness of our proposed framework, and an-swer the question: Can social relations improve sentimentclassification of microblogging messages?. In particular, weevaluate the proposed method on the two datasets intro-duced in Section 4. Impacts brought by size of training set,different relations and other factors that appear to affect theexperiment are further discussed.

7.1 Performance EvaluationBelow we first present the finding of comparing SANT

with the classical text-based sentiment classification meth-ods, and then with the models that incorporate social rela-tions in sentiment classification.

7.1.1 Comparison with Text-based MethodsIn the first set of experiments, we use classification accu-

racy as the performance metric, and compare the proposedframework SANT with following text-based methods:

Table 2: Sentiment Classification Accuracy on STS DatasetD10% (gain) D25% (gain) D50% (gain) D100% (gain)

LS 0.670 (N.A.) 0.704 (N.A.) 0.720 (N.A.) 0.713 (N.A.)Lasso 0.699 (+4.22%) 0.722 (+2.56%) 0.746 (+3.50%) 0.759 (+6.38%)

MinCuts 0.677 (+0.93%) 0.705 (+0.27%) 0.727 (+0.89%) 0.757 (+6.10%)LexRatio 0.699 (+4.25%) 0.746 (+5.97%) 0.753 (+4.55%) 0.763 (+6.94%)SANT 0.764 (+13.90%) 0.778 (+10.56%) 0.793 (+10.02%) 0.796 (+11.52%)

Table 3: Sentiment Classification Accuracy on OMD DatasetD10% (gain) D25% (gain) D50% (gain) D100% (gain)

LS 0.615 (N.A.) 0.634 (N.A.) 0.654 (N.A.) 0.660 (N.A.)Lasso 0.626 (+1.81%) 0.663 (+4.62%) 0.698 (+6.76%) 0.709 (+7.49%)

MinCuts 0.659 (+7.16%) 0.664 (+4.80%) 0.674 (+3.06%) 0.697 (+5.64%)LexRatio 0.613 (-0.18%) 0.633 (-0.22%) 0.655 (+0.18%) 0.659 (-0.06%)SANT 0.685 (+11.51%) 0.717 (+13.04%) 0.748 (+14.47%) 0.763 (+15.72%)

LS : Least squares [7] is a widely used supervised clas-sification method for i.i.d. data.

Lasso: Lasso [7] on tweet content only. This is one ofthe most popular sparse learning methods.

MinCuts: Pang and Lee [31] utilized contextual in-formation via the minimum-cut framework to improvepolarity-classification accuracy. In the experiment, weuse MinCuts package provided by the freely availablesoftware LingPipe6.

LexRatio: The methods [30, 40] count the ratio of sen-timent words, from OpinionFinder subjectivity lexi-con7, in a tweet to determine its sentiment orienta-tion. Due to its unsupervised setting, LS is used forthe tweets do not contain any sentiment words.

There are three important parameters in our experiments,including , in Eq. (6) and in Section 5.2. All three pa-rameters are positive. is the parameter to control the con-tribution of sentiment relation information, is the sparseregularization parameter, and is the parameter to combinethe two sentiment relations. As a common practice, and are tuned via cross-validation. In the experiments, we set = 0.05 and = 0.1 for general experiment purposes. Weset = 1 which means the two sentiment relation matricesare simply combined with equal weight.Experimental results of the methods on the two datasets,

STS and OMD, are respectively reported in Table 2 and 3.In the experiment, we used five-fold cross validation. To testthe sensitivity of SANT to different sizes of training data,in the tables, Dpercentage denotes the amount of data usedfor training as a percentage of whole training dataset. Forexample, in each round of the experiment, 80% of the wholedataset is used for training. D50% means we chose 50% of80% thus using 40% of the whole dataset as training data.The test dataset is always 20% of the whole dataset. Inthe tables, gain represents the percentage improvement ofthe methods as compared to the traditional LS method. Inthe round of the experiment, the result denotes the average

6http://alias-i.com/lingpipe/7http://www.cs.pitt.edu/mpqa/opinionfinder_1.html

score of 10 test runs. By comparing the results of differentmethods, we draw the following observations:(1) Compared with the text-based methods, SANT achieves

consistently better performance on both datasets with dif-ferent sizes of training data. The highest improvement withrespect to LS is obtained on the OMD dataset when thetraining dataset is D100. We apply t-test to compare SANTwith the best methods MinCuts and LexRatio. The experi-ment results demonstrate that, by exploiting social relations,our proposed model is able to achieve significant improve-ment (with the significance level = 0.01) as compared tothe state-of-the-art methods.(2) The sparse learning based method Lasso achieves bet-

ter performance than least squares based method LS. Thisshows that a sparse solution of the feature space is an ef-fective way to tackle with noisy microblogging data. Theintroduction of sparse regularization has positive impactson the proposed sentiment classification method.(3) It is noted that the proposed SANT also achieves sig-

nificant improvement with respect to the baseline methodswhen using a small training dataset. It outperforms the LSbaseline of 13.90% and 11.51% in the two datasets using only10% as the training dataset, which demonstrates that ourproposed method is robust to a training dataset with smallnumber of training samples. In addition, SANT achievesbetter performance with only 10% training data comparingwith LS with all training data. It indicates that, by inte-grating social relation information, our proposed model cansignificantly save labeling cost. We will discuss the sensi-tivity of our proposed method to various sizes of trainingdatasets in Section 7.2.In summary, the proposed model consistently achieves

better performance than the state-of-the-art methods basedon text alone. It suggests that the social relation informa-tion positively help improve sentiment classification. In thenext subsection, we compare SANT with the models thatmake use of social relations.

7.1.2 Incorporating Social RelationsTo further evaluate SANT, in the second set of experi-

ments, we use the following methods in this set of exper-iments; in other words, all methods utilize social relationinformation in classification.

10% 25% 50% 100%0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

Size of Training Data

Clas

sifica

tion

Accu

racy

LS_NLS_TNLasso_NLasso_TNSANT

Figure 2: Sentiment Classification on STS Dataset

10% 25% 50% 100%0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95


Clas

sifica

tion

Accu

racy

LS_NLS_TNLasso_NLasso_TNSANT

Figure 3: Sentiment Classification on OMD Dataset

LS N : Least squares is applied on sentiment relationinformation only. Following previous work [35], tweet-tweet relation information is used as feature expansionfor each tweet. If a tweet ti is related to another tweettj , we add the tweet id tj as a feature into tis featurevector. The following LS TN, Lasso N and Lasso TNmethods use the same technique to integrate sentimentrelation information between tweets.

LS TN : Least squares on tweet content and sentimentrelation information together. A sentiment relationmatrix is utilized as a feature augmentation of thecontent feature space. The final feature space is con-structed with the equally weighted combination of con-tent matrix and tweet-tweet sentiment relation matrix.

Lasso N : Lasso on sentiment relation information only.This is the sparse version of the method LS N.

Lasso TN : Lasso on tweet content and sentiment re-lation information together. This is the sparse versionof the method LS TN. The relation information be-tween tweets is employed as a simple feature expan-sion, which differs from our proposed method.

Following parameter and experiment settings discussed inthe first experiment, we conduct the baseline methods on thetwo Twitter datasets with different percentages of trainingdata. The classification performance of the methods are

10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0.6

0.65

0.7

0.75

0.8


Sent

imen

t Cla

ssific

atio

n Ac

cura

cy

MinCutsSANT

Figure 4: Sensitivity of SANT to Training DataSizes on STS

plotted in Figures 2 and 3, from which we can draw thefollowing observations:(1) Among the five methods, SANT achieves the best per-

formance on both datasets with different sizes of trainingdata. It indicates that, comparing with other methods ofincorporating social relations, our proposed model success-fully utilize the social relations for sentiment analysis.(2) LS TN and Lasso TN achieve better performance than

LS N and Lasso N. In addition, the social relation informa-tion based methods LS N and Lasso N are slightly betterthan randomly (accuracy = 0.5) assigning a sentiment la-bel to the messages. The results demonstrate that it is in-accurate to rely on sentiment relation information only todetermine the sentiment of a microblogging message.In summary, the existing methods perform differently in

sentiment classification. In some cases, the use of social re-lation information does not help performance improvement.It suggests that the way of using social relations is also im-portant. The superior performance of the proposed methodSANT validates its excellent use of social relation informa-tion in sentiment analysis.

7.2 Sensitivity to Training Data SizesOne difficulty in sentiment analysis is the lack of manu-

ally labeled training data. The publicly available datasetsare always insufficient for training purposes in supervisedlearning methods. We showed that our proposed method isrobust to small size of training data in Tables 2 and 3. Inorder to further investigate the sensitivity of the proposedSANT framework to the size of training data, in Figure 4,we plot the sentiment classification accuracy with trainingdata from 10% to 100% on the STS dataset.In the figures, we compare the performance of SANT and

MinCuts. SANT consistently outperforms MinCuts withdifferent sizes of training data. Compared with the resultsfrom using the whole training dataset, SANT achieves greaterimprovement with respect toMinCuts when the size of train-ing data is small. SANT does not show significant changeswhen the size of the training data changes, which demon-strates that our proposed method is not sensitive to train-ing data sizes. This property has its significance due tothe widely existed lack of manually labeled training dataproblem in sentiment classification tasks.

0 1e6 1e5 1e4 1e3 0.01 0.1 1 100.65

0.7

0.75

0.8

Sent

imen

t Cla

ssifi

catio

n Ac

cura

cy

Sentiment ConsistencyEmotional Contagion

(a) STS Dataset

0 1e6 1e5 1e4 1e3 0.01 0.1 1 100.65

0.7

0.75

0.8

Sent

imen

t Cla

ssifi

catio

n Ac

cura

cy

Sentiment ConsistencyEmotional Contagion

(b) OMD Dataset

Figure 5: Performance Variation of SANT

7.3 Effect of Sentiment RelationsIn the experiments, we combined the two sentiment rela-

tions with equal weight. To further understand the effect ofeach relation on the performance of sentiment classification,experiments are conducted with separate sentiment relationon the two datasets. The parameter controls the contribu-tion of sentiment relation to the model. We varied the valueof from 0 to 10. The results are presented in Figures 5(a)and 5(b). The red curve shows the performance of Senti-ment Consistency (SC ), the blue dotted curve depicts theperformance of Emotional Contagion (EC ), and the greenline is baseline without sentiment relation information.In Figures 5(a) and 5(b), the curves of SC reach the peak

at = 0.01 and = 0.001 respectively. For most param-eter settings, the classification accuracy with SC is higherthan the baseline. This demonstrates that, with only onesentiment relation, SANT can improve the sentiment clas-sification performance as well. When the value of parameter is not too extreme, SC is not sensitive to the parame-ter setting. This is an appealing property of the proposedmethod because it is not necessary to make much effort totune the parameter. The method can consistently achievegood performance with a large range of parameter settings.The trend of curve EC is similar to SC in the figures.With different parameter settings, the two relations achieve

comparable results. Intuitively, the first sentiment relationSC should have a stronger impact to the model than ECrelation. A potential reason is that the constructed matrixof SC is much more sparse than the latter one.

7.4 Multi-Class Sentiment ClassificationAs discussed in Section 3, following [8, 35], we focused on

the polarity sentiment classification task in this paper. Itis observed that many tweets do not show clear emotionsin real-world applications. Users may post objective expres-sions about entities and events. For example, in the OMDdataset used in our experiment, besides tweets with posi-tive and negative sentiments, we still have tweets in othercategories. In this case, our proposed model can be eas-ily applied to this application, which classifies the tweets aspositive, negative and neutral. We next present some pre-liminary results.We added the tweets with neutral and other as sentiment

labels in the OMD data to construct a three class dataset.The dataset consists 3,269 tweets with the class label as pos-itive, negative and neutral. We compared the performanceof our proposed model SANT with the baseline methodsLS and LexRatio. Among the three methods, our proposedmethod has the best performance at 58.3% and it achieves11.66% improvement as compared to LS. Although the focusof this paper is polarity sentiment classification, our methodis quite general to be applied to real-world multi-class (> 2)sentiment analysis applications.

8. CONCLUSIONS AND FUTURE WORKDifferent from texts in traditional media, microblogging

texts are noisy, short, and embedded with social relations,which presents challenges to sentiment analysis. In this pa-per, we propose a novel sociological approach (SANT ) tohandle networked texts in microblogging. In particular, weextract sentiment relations between tweets based on socialtheories, and model the relations using graph Laplacian,which is employed as a regularization to a sparse formula-tion. Thus the proposed method can utilize sentiment rela-tions between messages to facilitate sentiment classificationand effectively handle noisy Twitter data. We further de-velop an optimization algorithm for SANT. Experimentalresults show that the user-centric social relations are helpfulfor sentiment classification of microblogging messages. Em-pirical evaluations demonstrate that our framework signif-icantly outperforms the representative sentiment classifica-tion methods on two real-world datasets, and SANT achievesconsistent performance for different sizes of training data, auseful feature for sentiment classification.This work suggests some interesting directions for future

work. For example, it would be interesting to investigate thecontributions of different sentiment relations to sentimentclassification. Other information, like spatial-temporal pat-terns, could be potentially useful to measure the sentimentconsistency of people as well [28]. For example, people inMiami might be happier about the temperature than peo-ple in Chicago during winter time. We can further explorehow sentiments diffuse in the social network and how peo-ples sentiments correlate with internal (their friends) andexternal (public events [15]) factors. With the analysis, itis possible for us to understand the differences of sentimentbetween the online world and physical world.

AcknowledgmentsWe thank the anonymous reviewers for their comments. Thiswork is, in part, supported by ONR (N000141110527) and(N000141010091).

9. REFERENCES[1] R. Abelson. Whatever became of consistency theory?

Personality and Social Psychology Bulletin, 1983.

[2] L. Barbosa and J. Feng. Robust sentiment detectionon twitter from biased and noisy data. In Proceedingsof ACL, 2010.

[3] J. Bollen, A. Pepe, and H. Mao. Modeling public moodand emotion: Twitter sentiment and socio-economicphenomena. CoRR, abs/0911.1583, 2009.

[4] F. Chung. Spectral graph theory. Number 92. AmerMathematical Society, 1997.

[5] X. Ding and B. Liu. The utility of linguistic rules inopinion mining. In Proceedings of SIGIR, 2007.

[6] J. Fowler and N. Christakis. The dynamic spread ofhappiness in a large social network. BMJ: Britishmedical journal, 2008.

[7] J. Friedman, T. Hastie, and R. Tibshirani. Theelements of statistical learning, 2008.

[8] A. Go, R. Bhayani, and L. Huang. Twitter sentimentclassification using distant supervision. TechnicalReport, Stanford, 2009.

[9] Q. Gu and J. Han. Towards feature selection innetwork. In Proceedings of CIKM, 2011.

[10] E. Hatfield, J. Cacioppo, and R. Rapson. Emotionalcontagion. Current Directions in PsychologicalScience, 1993.

[11] M. Hu and B. Liu. Mining and summarizing customerreviews. In Proceedings of SIGKDD, 2004.

[12] X. Hu and H. Liu. Text analytics in social media.Mining Text Data, pages 385414, 2012.

[13] X. Hu, N. Sun, C. Zhang, and T.-S. Chua. Exploitinginternal and external semantics for the clustering ofshort texts using world knowledge. In Proceedings ofCIKM, pages 919928, 2009.

[14] X. Hu, L. Tang, and H. Liu. Enhancing accessibility ofmicroblogging messages using semantic knowledge. InProceedings of the 20th CIKM, 2011.

[15] Y. Hu, A. John, D. Seligmann, and F. Wang. Whatwere the tweets about? topical associations betweenpublic events and twitter feeds. ICWSM, 2012.

[16] Y. Hu, A. John, F. Wang, and S. Kambhampati.Et-lda: Joint topic modeling for aligning events andtheir twitter feedback. In AAAI, 2012.

[17] S. Ji and J. Ye. An accelerated gradient method fortrace norm minimization. In ICML, 2009.

[18] S. Kamvar and J. Harris. We feel fine and searchingthe emotional web. In Proceedings of WSDM, pages117126, 2011.

[19] D. Keltner and G. Bonanno. A study of laughter anddissociation: Distinct correlates of laughter andsmiling during bereavement. Journal of Personalityand Social Psychology, page 687, 1997.

[20] E. Kim, S. Gilbert, M. Edwards, and E. Graeff.Detecting sadness in 140 characters: Sentimentanalysis of mourning michael jackson on twitter. 2009.

[21] H. Kwak, C. Lee, H. Park, and S. Moon. What istwitter, a social network or a news media? InProceedings of WWW, 2010.

[22] K. Lewis, M. Gonzalez, and J. Kaufman. Socialselection and peer influence in an online socialnetwork. PNAS, pages 6872, 2012.

[23] B. Liu. Sentiment analysis and opinion mining.Synthesis Lectures on Human Language Technologies,pages 1167, 2012.

[24] J. Liu, S. Ji, and J. Ye. Multi-task feature learning viaefficient l 2, 1-norm minimization. In Proceedings ofUAI, 2009.

[25] P. Marsden and N. Friedkin. Network studies of socialinfluence. Sociological Methods & Research, 1993.

[26] M. McPherson, L. Smith-Lovin, and J. Cook. Birds ofa feather: Homophily in social networks. Annualreview of sociology, pages 415444, 2001.

[27] M. Miller, C. Sathi, D. Wiesenthal, J. Leskovec, andC. Potts. Sentiment flow through hyperlink networks.In Proceedings of ICWSM, 2011.

[28] A. Mislove, S. Lehmann, Y. Ahn, J. Onnela, andJ. Rosenquist. Pulse of the nation: Us moodthroughout the day inferred from twitter. 2010.

[29] Y. Nesterov and I. Nesterov. Introductory lectures onconvex optimization: A basic course. 2004.

[30] B. O Connor, R. Balasubramanyan, B. Routledge, andN. Smith. From tweets to polls: Linking textsentiment to public opinion time series. In Proceedingsof ICWSM, 2010.

[31] B. Pang and L. Lee. A sentimental education:Sentiment analysis using subjectivity summarizationbased on minimum cuts. In Proceedings of ACL, 2004.

[32] B. Pang and L. Lee. Opinion mining and sentimentanalysis. Foundations and Trends in InformationRetrieval, pages 1135, 2008.

[33] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?:sentiment classification using machine learningtechniques. In Proceedings of EMNLP, 2002.

[34] D. Shamma, L. Kennedy, and E. Churchill. Tweet thedebates: understanding community annotation ofuncollected sources. In Proceedings of the first SIGMMworkshop on Social media, pages 310. ACM, 2009.

[35] M. Speriosu, N. Sudan, S. Upadhyay, andJ. Baldridge. Twitter polarity classification with labelpropagation over lexical links and the follower graph.In Proceedings of the First Workshop on UnsupervisedLearning in NLP, 2011.

[36] C. Tan, L. Lee, J. Tang, L. Jiang, M. Zhou, and P. Li.User-level sentiment analysis incorporating socialnetworks. In Proceedings of SIGKDD, 2011.

[37] J. Tang and H. Liu. Feature selection with linked datain social media. In SDM, 2012.

[38] J. Tang, X. Wang, H. Gao, X. Hu, and H. Liu.Enriching short texts representation in microblog forclustering. Frontiers of Computer Science.

[39] J. Wiebe, T. Wilson, and C. Cardie. Annotatingexpressions of opinions and emotions in language.Language Resources and Evaluation, 39:165210, 2005.

[40] T. Wilson, J. Wiebe, and P. Hoffmann. Recognizingcontextual polarity in phrase-level sentiment analysis.In Proceedings of HLT and EMNLP, 2005.

[41] X. Zhao, G. Li, M. Wang, J. Yuan, Z.-J. Zha, Z. Li,and T.-S. Chua. Integrating rich information for videorecommendation with multi-task rank aggregation. InProceedings of ACM MM, 2011.

using structured events to predict stock price movement

Documents