sentiment expression conditioned by affective transitions and social...

Sentiment Expression Conditioned by AffectiveTransitions and Social Forces

Moritz Sudhof∗ Andrés Goméz Emilsson† Andrew L. Maas∗ Christopher Potts‡∗Computer Science †Psychology ‡Linguistics

Stanford University{sudhof, nc07agom, amaas, cgpotts}@stanford.edu

ABSTRACTHuman emotional states are not independent but rather pro-ceed along systematic paths governed by both internal, cog-nitive factors and external, social ones. For example, anx-iety often transitions to disappointment, which is likely tosink to depression before rising to happiness and relaxation,and these states are conditioned by the states of others inour communities. Modeling these complex dependencies canyield insights into human emotion and support more power-ful sentiment technologies.

We develop a theory of conditional dependencies betweenemotional states in which emotions are characterized notonly by valence (polarity) and arousal (intensity) but alsoby the role they play in state transitions and social relation-ships. We implement this theory using conditional randomfields (CRFs) that synthesize textual information with infor-mation about previous emotional states and the emotionalstates of others. To assess the power of affective transitions,we evaluate our model in a collection of ‘mood’ updates fromthe Experience Project. To assess the power of social fac-tors, we use a corpus of product reviews from a website inwhich the community dynamics encourage reviewers to beinfluenced by each other. In both settings, our models yieldimprovements of statistical and practical significance overones that classify each text independently of its emotionalor social context.

Categories and Subject DescriptorsI.2.7 [Natural Language Processing]: Discourse; Textanalysis

Keywordsmultidimensional sentiment analysis; sentiment as social;sentiment transitions

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected]’14, August 24–27, 2014, New York, NY, USA.Copyright is held by the owner/author(s). Publication rights licensed to ACM.ACM 978-1-4503-2956-9/14/08 ...$15.00.http://dx.doi.org/10.1145/2623330.2623687.

1. INTRODUCTIONHuman emotional states are not independent, nor are they

experienced in isolation. Rather, they proceed along system-atic paths and cohere into psychologically important groups.Anxiety often transitions to disappointment, which is likelyto sink to depression, often a long-lasting state. Happi-ness amplifies to elation or settles down to contentment.States like anxiety, optimism, and worry are inherently tran-sitional; one’s optimism might be vindicated, leading to pos-itive states, or crushed, leading to disappointment. All thesestates are also conditioned by the states of others in our com-munities; complex social forces can push our emotions closerto those around us or pull them farther away.

Our goal is to show that modeling dependencies betweenemotional states can yield insights into human emotion andsupport more powerful sentiment technologies. In our firstset of experiments (Section 3), we concentrate on the affec-tive transitions that individuals experience. Using a collec-tion of users’ ‘mood’ updates from the Experience Project,1

we develop a simple probability model for capturing the con-ditional dependencies between emotional states. The ideasare founded in dimensional models of emotion [10, 28, 29, 34]based in valence (polarity) and arousal (intensity), but weemphasize the role that emotions play in transitions betweenstates. We implement this theory using conditional randomfields (CRFs; [13, 32]) that synthesize textual informationwith information about temporally organized sequences ofemotions, and we show that this delivers improvements ofstatistical and practical significance over models that clas-sify each text independently of its emotional or social con-text. The improvements are largest for transitional emo-tional states, which are often particularly important for in-dustry applications, since moments of emotional inflectionoften coincide with changing opinions or attitudes.

We then extend our model to social influences (Section 4).Using a corpus of product reviews from a website with richcommunity interactions [8, 18, 19], we present evidence thatthe current reviewer for a product is influenced by the se-quence of reviews already posted about that product. Here,the sequences come, not from a specific individual, but ratherfrom group-level actions and reactions. Once again, CRFsincorporating this sequence information out-perform simpleclassifiers by a large margin. This is an important coun-terpart to our moods study, not only because of its socialdimension, but also because it highlights the value of transi-tion information for common polarity-based sentiment tasks.

1http://www.experienceproject.com

2. RELATED WORKOur goal is to understand and make use of the depen-

dencies between emotional states, which derive from bothpersonal and social factors. Our work thus finds precedentin diverse areas of natural language processing, social net-work analysis, and cognitive psychology, including modelsof social influence and multidimensional sentiment expres-sion. This section provides a high-level summary of theseconnections with the existing literature.

Our view of sentiment is a dimensional one [10, 28, 29,34], in which emotions are characterized by a small numberof more basic dimensions, usually valence and arousal butsometimes much more abstract ones (e.g., [15]). Valenceroughly corresponds to polarity (like/dislike, pleasure/pain,pleasant/unpleasant), and arousal corresponds to intensityof experience, mirroring the amount of energy exhibited,spent, or available during a given emotional state. For ex-ample, ecstatic and serene are both emotions of positive va-lence but opposite arousal levels, just as angry and sad areboth negative valence emotions of opposite arousal levels.Our own model can be seen as having valence and arousalas its foundation, but we focus on the patterns found in thetransition dynamics between emotions through time.

Our moods data take us well beyond the basic polarityclassifications or scales that dominate in computational sen-timent analysis [11, 25, 26, 27, 30] and are thus more remi-niscent of the multidimensional categories of [2, 14, 16, 22,31, 35]. Even our star-ratings-based experiments with prod-uct reviews (Section 4) highlight the importance of havinga ‘neutral’ category along the valence dimension [12], espe-cially in communities that might be polarized in their opin-ions but nonetheless seek cohesion, a social force that canpush evaluations away from the extremes.

We also study the social factors that influence the atti-tudes and emotions of community members. This can bethought of as part of a growing literature that seeks to un-derstand online, user-provided metadata as both reflectingand shaping complex social processes concerning influence,group cohesion, individual assertion of identity, and senti-ment diffusion in networks [1, 5, 8, 20, 21, 38]. Our generalapproach is also influenced by [24], who use social variablestogether with textual ones to predict expressed preferencesin political speeches. A related model is used by [33] to cap-ture the ways in which social relationships pattern with atti-tudes and evaluations. Our CRFs are conceptually differentfrom the graph-based models in these papers, but they aimto capture similar insights about how sentiment predictionsshould be made partly on the basis of high-level emotionaland contextual cues (see also [4]).

Our quantitative evaluations use conditional random fields(CRFs; [13, 32]), which are discriminative models in which itis possible to model complex dependencies not only betweenthe input and output variables, as in typical classificationproblems, but also among the output variables themselves.General CRFs can model essentially any kind of graphi-cal structure, but we concentrate on the more tractablespecial case of a linear-chain CRF, in which the outputvariables are arranged in a linear sequence, creating a dis-criminative counterpart to the generative Hidden MarkovModel. Linear-chain CRFs have been used for a wide va-riety of sequence-labeling phenomena relating to linguisticstructure, and they have been successfully applied to moresocial and contextual tasks of the sort we study here, includ-

ing opinion-source identification [6], sentiment transitionsinside documents [17], and social network extraction [7].

3. AFFECTIVE TRANSITIONSWe now address the personal, internal factors that shape

the temporal flow of emotions people experience.

3.1 Experience Project MoodsOur ‘moods’ dataset is derived from community-member

updates at the Experience Project (EP) social networkingsite. When logged in, EP members can post short textsdescribing their moods and also choose from a variety ofdifferent mood labels, which we use to classify the texts intoemotional categories. Mood status updates are visible onmember profiles. Table 1 gives an example of a sequence ofthree posts by the same user; the corpus contains additionalmetadata, but we use only the information in this table.

We work with a sample of about 2 million anonymizedmood posts with unique author identifiers and 174 differentmood labels for emotional, evaluative, and attitudinal states.Figure 1 summarizes the distribution of the top 20 moods byupdate frequency. This fragment of the full dataset showsthat the corpus can be used to study sentiment analysis inthe broadest terms, since individual pairs of labels or clustersof labels can be used to develop familiar polarity models [27],polarity models with different levels of intensity [3, 35, 36],models of sentiment based in social emotions like sympathyand solidarity [16, 31], and many others. For our purposes,the mix of valence and arousal with transitional emotionslike anxiety and optimism is the most important aspect ofthe data set. In the next section, we study the entire dis-tribution, seeking to motivate a high-level classification ofthe moods into distinct categories based on their transitionrelationships with other moods.

alivesleepy

stressedoptimistic

boredblah

cheerfulconfusedamusedannoyedanxioushopefullonelytiredsad

exciteddepressed

calmhornyhappy

253232631626777285692864329119292352985031609332203499837504

515905209752975

6303565614

7634477209

89344

Figure 1: Top 20 mood labels by frequency, account-ing for about 40% of the updates in our sample.

3.2 Mood TransitionsOne’s current emotional state will be heavily influenced by

one’s previous emotional state. The state before that willalso exert an influence on the present, but less so. More gen-erally, we expect previous states to influence current ones,with the influences weakening (becoming less direct) the far-ther back in time we go.

Time Mood Text

2013-07-28 11:56:56 sad no one wants me . feeling sad cause i dont want me either2013-07-28 22:41:40 lonely Laying in this hospital bed I thought I wanted to be here I don’t , take me home2013-07-29 02:32:01 depressed im sorry i need someone to talk to i need to not be a sub for 5 mins i just need a friend. please

...

Table 1: A partial sequence of mood updates from a single user.

log-

scal

e C

TP(a

,b)

bewilderedartisticcuriousbouncychill

distressed

disappointedexcited

bewilderedcrushed

chill

chipperlazybitchybored

chipperblissfulbouncyenergeticflirty

numb

lonelydevastated

angry

melancholy

loved

amazing

goodexcitedamazed

optimistic

disappointed

peacefulblessed

determined

lonelycrusheddevastatedokay

melancholy

soreenergeticexcitedpeacefulrelaxed

busy

exhausteddistresseddrainednumb

awakesore

lazybusy

exhausted

worried

devastatednumb

okay

scared

amused anxious blah cheerful depressed happy hopeful sad satisfied stressed tired upset

0.03

0.04

0.06

0.1

Figure 2: Mood compressed transition probabilities (CTP values). Each column labeled with emotion a showsthe emotions b with largest CTP(a, b), as defined in equation 2.

To capture this pattern of influences, we first define emo-tional transition probabilities in a way that encompasses thefull set of historical relationship between emotions. Let E bethe full set of emotions, and let C(a, t, b) be the number oftimes a user posted emotion a ∈ E and then posted emotionb ∈ E exactly t days later. Crucially, these transitions arecounted whether or not there is an update between a and b.Then the conditional probability of b given a after t days isgiven by equation 1.

P (b | a, t) =C(a, t, b)∑

b′∈E C(a, t, b′)(1)

Using these values, we then define the compressed transitionprobability between emotions a and b as follows:

CTP (a, b) = (c− 1)

∞∑t=0

P (b | a, t)ct+1

(2)

Here, c is a dampening constant that gives a higher weight torecent transitions. In practice, c can range between 1.5 to 2.5without significantly changing CTP values; in this paper, weuse c = 2. We choose an exponential decay function so thatdistant transitions affect the CTP value without outweighingmore recent transitions. Other dampening functions canalso be used; the exact method seems not to matter as longas recent transitions are preferentially weighted and distanttransitions are allowed to exert some influence.

The CTP measure has a second noteworthy advantage.There is significant diversity in the way EP members inter-act with the moods update feature. Some users post updatesat roughly regular intervals, whereas others do so in lesspredictable patterns. In taking into account the temporaldistance between updates, the CTP captures the way influ-ence varies with time, making it more robust to this kind ofbehavioral variation than interval-blind methods are.

Figure 2 depicts a sample of CTP(a, b) values. The con-ditioning emotions a are given along the top. For each, the

emotions b with highest CTP to them are given in rank or-der, according to the logCTP(a, b) values, normalized bythe frequency of b to correct for differing usage levels. Self-transitions generally have dramatically larger CTP valuesthan do other emotions, so those are not depicted. Fre-quency normalization and the log-scale of the y-axis areintended to make the plot more readable and to facilitatecomparisons within and across columns.

The patterns in Figure 2 are intuitive, highlighting va-lence, arousal, and transition as important ingredients. Forinstance, the likely next states after a simple positive statelike ‘happy’ have the same polarity, differing largely in in-tensity (and other social components). In contrast, thetransition state ‘hopeful’, while positive in its own right,has a much more mixed set of later states associated withit, reflecting the uncertainty of this emotion. The emo-tions ‘sad’ and ‘anxious’ are rough negative duals of these:‘sad’ emerges as a fairly standard negative valence category,whereas ‘anxiety’ is itself negative but transitions to bothpositive and negative outcomes. There are numerous otherpatterns like these in the data, reflecting not only emotionalbut also interactional and real-world factors (e.g., ‘bored’ to‘bitchy’; ‘tired’ to ‘sore’; ‘stressed’ to ‘busy’).

To try to get a comprehensive, high-level view of howemotions relate to each other, we can also represent ourentire data set as a directed graph based on the full matrixof CTP values, as in Figure 3. For any two emotions a andb, the edge from a to b has weight CTP(a, b). The resultinggraph represents the flow between emotional states of themood updates population, thereby describing the affectivespace as a dynamic system. This abstraction is helpful forrevealing the dynamic structure of the space. In particular,it is possible to use this graph to identify clusters of emotionsthat are densely connected with each other.

In order to cluster the emotions into disjoint partitions,we employ a variant of weighted label propagation [37] that

Figure 3: The moods represented as a weighted, directed graph in which edge labels are determined by CTPvalues, and cluster assignments (indicated by color) are also made based on those values. Node sizes aredetermined by the node’s PageRank. White bounding boxes indicate prominent cluster moods.

is tailored specifically for highly dense graphs (by takingthe average edge weight from each cluster instead of itssum). The main idea underpinning the algorithm is to iden-tify groups of emotions with a higher than expected within-group transition probability. Unlike other weighted labelpropagation algorithms, our variant predetermines the num-ber of clusters, and each emotion is iteratively assigned tothe cluster from which the average CTP to it is the high-est (as opposed to obtaining the label of the single heaviestincoming edge). The algorithm works as follows:

1. Initialize n clusters Ci randomly.

2. For each cluster Ci and each emotion ej , compute themean of the values CTP(ei, ej) for ei ∈ Ci and assignej to the cluster with the highest such value.

3. Repeat 2 until the cluster assignments stabilize.

Self-transitions are dampened by multiplying them by asmall constant s (here, 0.005) and normalizing accordingly.In our experience, large values of s (e.g., 0.01) generallylead to faster and more reliable convergence, but the result-ing clusters are less compelling, whereas smaller values of sdeliver substantially more plausible clusters but sometimeslead to non-convergence due to cyclic label assignments. Weran the algorithm 100 times using the parameters from Fig-ure 3 (s = 0.005, 5 clusters) and it converged 67% of thetime, with convergent runs taking an average of 25.40 itera-tions. Given that our primary goal is to explore the structureof the data set, these rates are more than acceptable.

Figure 3 uses color to represent a clustering of the moodspace into five cells. Node sizes are scaled by a node’sPageRank, which correlates with frequency but reduces the

effect of self-transitions. Given the way in which the al-gorithm is defined, clusters have a higher than expectedwithin-cluster transition probability. Thus, they can be in-terpreted as enduring emotional states. If we label thembased on their prominent members (by PageRank), we getclusters ‘tired/bored’ (orange), ‘horny’ (yellow), ‘sad/calm’(purple), ‘happy’ (green), and ‘blah’ (red), which is an in-tuitive set of groups based primarily on valence but furtherstructured by other social and emotional factors. By varyingthe number of clusters, we achieve different levels of resolu-tion. For example, with three clusters, the partitions show anear perfect division between negative, neutral/mixed, andpositive valence emotions. With ten clusters, rarefied clus-ters emerge relating to erotic states, lack of energy (‘lethar-gic’, ‘exhausted’), high-valence positive emotions (‘ecstatic’,‘delighted’, ‘blissful’), and so forth.

The directed nature of the graph also highlights the roleemotions play in affective transitions. At a high level, we seethe densest pathways between ‘sad/calm’ and ‘blah’ (purple,red), between ‘blah’ and ‘tired/bored’ (red, orange), andbetween ‘happy’ and ‘horny’ (green, yellow). The pathwaysare more asymmetrical between ‘horny’ and ‘tired/bored’,typically running from the first to the second rather thanthe other way around. Some pairwise paths are extremelyunlikely in either direction. For instance, direct ‘happy’ to‘sad/calm’ (green to purple) transitions are rare; emotionalpaths that begin with ‘happy’ and end with ‘sad/calm’ arelikely to travel through other emotional states along the way.These patterns match well with our intuitions about humanexperiences, and they further support our contention thatemotions are defined as much by their transitions as theyare by their inherent properties.

3.3 ModelsWe apply CRFs as a modeling technique to capture emo-

tion structure of the sort we just identified in our moodsdata. Specifically, we use linear-chain CRF models, as theyefficiently represent temporal relationships among emotionlabels as well as the mapping from text features of a docu-ment to its emotion label. Formally, we have a collection ofdocument sequences D, where each document sequence d ∈D is a sequence of tuples, d = [(x1, e1), (x2, e2), . . . , (xT , eT )].Each tuple (xt, et) is a vector of document features xt alongwith the associated emotion label et. The sequence lengthT can vary for each sequence.

We construct a linear-chain CRF with potential functionsφj,k(xt, et) between an input feature xt,j and possible emo-tion et,k. Each φj,k is a binary indicator function, withφj,k(xt, et) = 1 when both feature j and label k are presentin a document and 0 otherwise. This type of potential func-tion is equivalent to the binary presence feature functions of-ten used in sentiment classification [27]. However, the CRFgoes beyond relationships between a single document’s textfeatures and its label. A second set of potential functionsτl,k(et−1, et) serve as binary indicators for whether emotionl was present in the previous document at time t − 1 andemotion k is present in the current document at time t.

We use a standard log-linear parameterization, which makesthe learning problem convex. Having defined our potentialfunctions and chosen a log-linear CRF approach, our modelderivation exactly follows those of other linear chain CRFmodels in the literature [13, 32]. We do not restate the likeli-hood and learning problem here; see [32] for the full details.

To evaluate the effect of modeling temporal emotion la-bel relationships, we compare our CRF approach to a time-independent classification approach. This model treats eachtuple (xt, et) as independent from other documents in thesequence and estimates the emotion label probability P (et |xt). To estimate this probability we use a maximum en-tropy (MaxEnt) classifier. The MaxEnt model is a standardchoice for classification tasks in sentiment analysis and otherNLP tasks. Furthermore, the MaxEnt model is a specialcase of our linear chain CRF approach where we remove thepotential functions τl,k(et−1, et). In removing these poten-tial functions from our CRF, we are left with only potentialfunctions relating the current text features xt to the emotionlabel et. Comparing these two models allows us to identifythe value in including sequence information.

We use the CRFsuite software package for both the CRFand MaxEnt models [23]. For both models, we use `2 reg-ularization. We choose the regularization penalty by cross-validating over possible values and evaluating developmentset performance. We then run 20 trials in which we ran-domly split the data 80%/20% into training and testing setsand evaluate performance for both models. We use the non-parametric Wilcoxon rank-sums test to measure the signif-icance of the difference between CRF and baseline perfor-mance. In order to focus on the effects of sequence structure,we restrict our textual features to simple unigrams.

3.4 Experimental Set-upWe conducted experiments on two different subsets of our

moods data: a simple valence/polarity subset and a multi-dimensional subset involving valence/polarity, arousal, andaffective transitions. The polarity experiments are relativelyeasy to interpret and permit general comparisons with more

familiar sentiment tasks, while the multidimensional exper-iments highlight the nuances of our moods data and revealmore of the power of affective transitions.

We built a polarity label set by clustering high-volumemoods with unambiguous valence, drawn from the green andpurple clusters in Figure 3:

• ‘positive’ = {‘happy’, ‘excited’, ‘thankful’}• ‘negative’ = {‘sad’, ‘lonely’, ‘depressed’, ‘angry’}

Figure 4 summarizes the transition structure of the polaritydata set. Each square depicts the probability of seeing asequence consisting of the row polarity and then the columnpolarity. The overall structure of the polarity transitions isclear: polarity states are generally enduring, with a slighttrend towards positivity. One concern one might raise isthat this trend could reflect, not affective transitions, butrather an overall tendency for individuals to be consistentlypositive or consistently negative. With Figure 4(b), we seekto address this by limiting attention just to high-variancesequences, defined as the top 20% of polarity sequences bylabel diversity. Thus, these sequences definitely come fromusers with variable emotional states. The same structureis evident here, though with the additional finding that,for these variable individuals, transitions from ‘negative’ to‘positive’ are more likely than the other way around. Asanother point of comparison, we include, in Figure 4(c), aversion of these heatmaps in which the mood sequences havebeen randomly shuffled. This destroys all transition struc-ture, revealing instead only the percentages of ‘positive’ and‘negative’ items in the data set.

For our multidimensional experiments, we would ideallystudy the entire set of moods, since Figure 3 suggests thatthey have intricate internal structure. However, the result-ing models would be intractable to analyze and impracti-cal to report on. Thus, we hand-selected six moods, withthe goal of capturing valence, arousal, and the transitionalcharacteristics of emotions, thereby preserving many of therich inter-emotion dynamics of the full set. The moods wechose for this are ‘cheerful’, ‘satisfied’, ‘hopeful’, ‘anxious’,‘stressed’, and ‘depressed’.

Although we focus only on a subset of the moods pre-sented in Figure 2, this visualization provides insights intothe transition structure we hypothesize that our CRF canleverage. Overall, since some emotions are more commonthan others, the label distribution for the multidimensionalsentiment classification experiments are not uniform. Theemotions ‘depressed’, ‘hopeful’, and ‘stressed’ are the mostcommon, and ‘satisfied’, ‘anxious’, and ‘cheerful’ are theleast common. These differing relative frequencies proba-bly reflect deeper underlying facts about the frequency orduration of these experiences.

In Section 3.2, we argued that one’s current emotion isaffected by one’s previous emotions, with influences dimin-ishing rapidly over time. The structure of our CRF modelforces us to make a simplifying Markov assumption that onlythe previous state influences the current one. Since this dis-cretizes the time element, we impose an additional restric-tion that any two consecutive mood posts in mood sequencescan be no more than 24 hours apart, to ensure that we modeltrue affective transitions rather than individuals’ general be-havior on the site (e.g., users who log on infrequently andonly when anxious). After enforcing this restriction, ourpolarity data set consists of 30,000 sequences containing ap-proximately 70,000 posts, and our multidimensional mood

negative

positive

positive negative

0.14

0.92

0.86

0.08

(a) All actual sequences.

negative

positive

positive negative

0.52

0.57

0.48

0.43

(b) High-variance sequences.

negative

positive

positive negative

0.59

0.59

0.41

0.41

(c) Randomized sequences (av-eraged over 20 randomizations).

Figure 4: Mood polarity transition probabilities, from the row state to the column state.

micro−average

macro−average

negative

positive

MaxEntCRF

0.0 0.2 0.4 0.6 0.8 1.0

0.81

0.81

0.78

0.83

0.8

0.79

0.77

0.82

*

*

*

*

(a) Precision.

micro−average

macro−average

negative

positive

MaxEntCRF

0.0 0.2 0.4 0.6 0.8 1.0

0.81

0.8

0.75

0.85

0.8

0.79

0.73

0.84

*

*

*

*

(b) Recall.

micro−average

macro−average

negative

positive

MaxEntCRF

0.0 0.2 0.4 0.6 0.8 1.0

0.81

0.8

0.77

0.84

0.8

0.79

0.75

0.83

*

*

*

*

(c) F1.

Figure 5: Moods polarity performance with bootstrapped 95% confidence intervals (often very small). Starsmark statistically significant differences (p < 0.001) according to a Wilcoxon rank-sums test.

micro−average

macro−average

depressed

hopeful

satisfied

cheerful

stressed

anxiousMaxEntCRF

0.0 0.2 0.4 0.6 0.8 1.0

0.51

0.46

0.57

0.51

0.44

0.45

0.4

0.41

0.49

0.45

0.53

0.49

0.45

0.43

0.39

0.4

*

*

*

*

*

(a) Precision.

micro−average

macro−average

depressed

hopeful

satisfied

cheerful

stressed

anxiousMaxEntCRF

0.0 0.2 0.4 0.6 0.8 1.0

0.51

0.41

0.73

0.55

0.36

0.3

0.29

0.24

0.49

0.38

0.74

0.53

0.31

0.24

0.26

0.22

*

*

*

*

*

*

*

(b) Recall.

micro−average

macro−average

depressed

hopeful

satisfied

cheerful

stressed

anxiousMaxEntCRF

0.0 0.2 0.4 0.6 0.8 1.0

0.51

0.43

0.64

0.53

0.39

0.36

0.34

0.31

0.49

0.4

0.62

0.51

0.36

0.31

0.31

0.28

*

*

*

*

*

*

*

*

(c) F1.

Figure 6: Multidimensional moods performance with bootstrapped 95% confidence intervals (often verysmall). Stars mark statistically significant differences (p < 0.001) according to a Wilcoxon rank-sums test.

dataset consists of approximately 20,000 sequences contain-ing 60,000 posts overall.

3.5 ResultsThe polarity and multidimensional CRF models achieve

absolute micro-average F1 gains of 2% over the MaxEntbaselines, and performance on individual labels improvesacross the board. All gains are statistically significant (p <0.001) according to a Wilcoxon rank-sums test. Figure 5summarizes polarity results, and Figure 6 summarizes mul-tidimensional moods results. The results are presented withbootstrapped 95% confidence intervals [9], often very smallbecause of the large amount of data and consistent modelperformance.

Since the multidimensional moods model must navigate adecision space of six labels, its overall performance is lowerthan that of the two-label polarity model, and inter-labelperformance disparities are greater. The advantage of themultidimensional moods model, however, is its ability toidentify granular emotions that are richer indications of aperson’s current and future affective states than the non-specific classes ‘positive’ and ‘negative’. For example, ‘anx-ious’ and ‘depressed’ are both negative valence mood states,but they have different implications for a user’s future state:‘anxious’ is a transitory mood state that tends towards res-olution, either positive or negative, whereas ‘depressed’ isa more enduring affective state with more limited possiblefuture states. In many applications, these distinctions areparticularly relevant. In the customer experience industry,for example, it is often most valuable not to characterize auser’s valence but rather to identify moments of emotionalinflection, which mark the points of volatility when a user isin the process of changing opinions or attitudes.

One measure of a mood’s volatility is the entropy of theprobability distributions over previous and subsequent moodstates, referred to as H(prev) and H(next), respectively.Moods with high H(prev) and H(next) are crossroads, inthat they arise from and transition to diverse sets of othermood states. Because such moods are transitory, they arerelatively infrequent and therefore systematically harder tomodel. For instance, a mood’s H(prev) is inversely relatedto its volume (linear regression; R2 = 0.80, p < 0.05), and amood’s volume is inversely related to its F1 score (linear re-gression; R2 = 0.86, p < 0.01). Nonetheless, the CRF mod-els are notably better at identifying them than the MaxEntmodels are, especially in terms of recall (linear regression;R2 = 0.71, p < 0.05).

For example, ‘anxious’ is the most high-entropy moodstate of the six we model, and it is the second most in-frequent. Confusion matrices show that the MaxEnt base-line frequently misclassifies ‘anxious’ posts: given sparse orambiguous ‘anxious’ documents, the contextually-unawareMaxEnt baseline tends toward the more frequent classes‘stressed’ and ‘depressed’. Our CRF, however, is less suscep-tible to this mistake because it incorporates knowledge of theparticular transition characteristics of ‘anxious’. CRF gainsin recall for ‘anxious’ are above average, and, conversely,precision gains for ‘stressed’ and ‘depressed’ are also higherthan average.

Overall, these experiments validate that a CRF model canincorporate the affective transition structure described inSection 3.2 to improve performance on both the more con-ventional polarity prediction task and the more complicated

multidimensional prediction task. The strong performancegains on infrequent labels, which are often of particular in-terest, is reflected by the macro-average F1 score, whichshows greater CRF gains over the baseline than the micro-average F1 score.

4. SOCIAL FORCESWe now extend the above model of affective transitions to

a social setting in which the evidence suggests that currentsentiment evaluations are influenced by previous ones.

4.1 RateBeer DataWe develop our basic model of social influences on senti-

ment evaluations using the RateBeer corpus, a collection of2.9 million user-supplied beer reviews.2 With respect to itsstructure, this corpus is like most corpora of user-suppliedproduct reviews. The review texts are typically just a fewsentences long and have associated with them a number ofdifferent kinds of rating, including aspects of the beer (e.g.,aroma, palate, taste) and an overall rating. (For discussionof these multi-aspect ratings, see [19].) The overall ratingsare on the scale 1–20, which we rescale into a more familiarspace of 5 star ratings (with fractions of a star possible).

4.2 Social forcesThe RateBeer corpus reflects complex social phenomena

that make it especially useful for our purposes. First, indi-vidual community members often write many reviews (4,798wrote more than 50; [8]). Second, we know from [18] thatreviewers themselves vary greatly in their level of expertiseabout beers. Third, we know from [8] that the commu-nity dynamics of the site are complex, with specific kinds ofusers influencing the evaluative and communicative norms incomplex ways that find many counterparts in off-line com-munities. These factors lead us to expect reviewers to beinfluenced by each other when making rating choices. Forinstance, if the current reviewer is new to the site, she mightshift her judgments towards those of more expert memberswho already posted reviews. Conversely, experts might feelthe need to push back against overly positive or negativereviews by newcomers. We do not at present have a deepunderstanding of the social forces at work, but we requireonly that forces like these be active.

Figure 8 provides additional information about the ratingdistribution and how we construe it. For the purposes ofour classification experiments, ‘positive’ reviews are thosewith ratings 4 or above, ‘negative’ reviews are those withratings 2.5 or below, and ‘neutral’ ones fall in the middle,as indicated in the figure. These boundaries were chosenwith the underlying rating distribution in mind. The dis-tribution is noteworthy for being much more skewed to themiddle of the rating scale than is typical for corpora of user-supplied reviews, which are usually dominated by positivereviews ([26], p. 74). We think that this too reflects the un-derlying social dynamics of the site: extreme evaluations (ateither end of the scale) are socially riskier; for example, onedoesn’t want to appear too glowing about a beer that is per-ceived as mundane, nor too critical of one that is perceivedas requiring expertise.

Figure 7 provides a high-level picture of the extent of theseinfluences, using the label categories indicated in Figure 8.

2http://snap.stanford.edu/data/web-RateBeer.html

negative

neutral

positive

positive neutral negative

0.07

0.21

0.51

0.4

0.67

0.44

0.53

0.12

0.04

(a) All actual sequences.

negative

neutral

positive


0.05

0.11

0.35

0.3

0.6

0.47

0.66

0.28

0.18

(b) High-variance products.

negative

neutral

positive


0.27

0.27

0.27

0.56

0.56

0.56

0.17

0.17

0.17

(c) Randomized sequences (aver-aged over 20 randomizations).

Figure 7: By-product transition probabilities, from the row state to the column state.

Rating

Reviews

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

0.3K

1.3K

2.5K

3.3K

4.8K

7.9K

negative(18% of reviews)

positive(25% of reviews)

neutral(58% of reviews)

Figure 8: RateBeer ratings and label categories.

Each square depicts the probability of seeing a sequence con-sisting of the row evaluation and then the column evalu-ation, using the probability model from our moods study(Section 3.2). The overall picture is easily characterized:self-transitions are most likely for all three categories, po-larity reversing sequences are least-likely, and ‘neutral’-to-‘positive’ sequences are more likely than ‘neutral’-to-‘nega-tive’. This pattern makes sense if we assume that thereare generally social costs to disagreeing with others. Im-portantly, Figure 7(b) shows that the general pattern holdseven when we restrict attention to products with extremelyhighly variable ratings, defined as those with rating stan-dard deviation in the third quartile for those values. Thissuggests that the patterns are not merely a consequence ofoverall community agreement. Indeed, the higher-variancereviews seem to reflect the ways in which disagreements arenegotiated. Figure 7(c) shows that the pattern disappearswhen we randomize the sequences; the only remaining pat-tern is fully explained by the overall label distribution.

4.3 ModelsThe CRF model is the same as it was for the moods data

(see Section 3.3), except label sequence features are nowproduct-level rating sequences. Once again, we compare theCRFs to MaxEnt classifiers with the same textual features.For both, regularization parameters are optimized indepen-dently using cross-validation on development sets.

4.4 Experimental Set-upTo study the gains achieved by incorporating social influ-

ences into our model, we run 20 classification trials. In each

trial, we randomly sample sequences such that the data setconsists of roughly 500,000 reviews. We train on 80% of thesampled sequences and test on the remaining 20%.

4.5 ResultsThe CRF model achieves an absolute micro-average F1

gain of 1% over the MaxEnt baseline, which translates toroughly 1,000 additional correct classifications over the base-line. Figure 9 summarizes by-label performance for both theMaxEnt and CRF models.

Although performance improves for all labels, we observestronger performance gains for ‘positive’ and ‘negative’ doc-uments than ‘neutral’ ones. This trend is reflective of thesequence structures described in Figure 7: the probabilitydistributions of states that transition to and arise from ‘posi-tive’ and ‘negative’ states are more discriminating than thosefor ‘neutral’ states. ‘Negative’ reviews are highly unlikely tofollow ‘positive’ ones and vice versa, and ‘neutral’ is oftenthe intermediate step in sequences that do transition fromone polarity to the other. As a result, given an ambiguous orsparse review that the MaxEnt baseline may, for example,predict to be ‘neutral’ or ‘negative’ with equal likelihood, theCRF model can use the previous review’s predicted class tohelp discriminate. In this example, if the previous reviewis predicted to be ‘positive’, the CRF will be particularlyunlikely to decide that the ambiguous review in question is‘negative’.

5. CONCLUSIONMany sentiment models and technologies make the sim-

plifying assumption that each sentiment expression (emo-tion, evaluation, perspective) was produced independentlyof all others. Our central observation is that this simplify-ing assumption ignores information that is psychologicallyand socially important, particularly for inherently transi-tional emotions like hope and anxiety. Using linear-chainCRFs, we showed furthermore that this information has sig-nificant predictive value, not only for multidimensional sen-timent data, but also in more traditional polarity tasks. Tohighlight the independent value of internal, cognitive fac-tors and external, social ones, we kept these two kinds ofinfluence apart in our models, but the CRF approach is flexi-ble enough to accommodate the complex graphical structureneeded to bring these two kinds of influence together into asingle predictive model. We look forward to future experi-

micro−average

macro−average

negative

neutral

positive

MaxEntCRF

0.0 0.2 0.4 0.6 0.8 1.0

0.72

0.73

0.75

0.72

0.72

0.71

0.71

0.72

0.71

0.7

*

*

*

*

*

(a) Precision.

micro−average

macro−average

negative

neutral

positive

MaxEntCRF

0.0 0.2 0.4 0.6 0.8 1.0

0.72

0.66

0.6

0.84

0.56

0.71

0.65

0.57

0.83

0.54

*

*

*

*

*

(b) Recall.

micro−average

macro−average

negative

neutral

positive

MaxEntCRF

0.0 0.2 0.4 0.6 0.8 1.0

0.72

0.69

0.66

0.77

0.63

0.71

0.67

0.63

0.76

0.61

*

*

*

*

*

(c) F1.

Figure 9: RateBeer results with bootstrapped 95% confidence intervals (often very small). Stars markstatistically significant differences (p < 0.001) according to a Wilcoxon rank-sums test.

ments that combine personal affective transitions and socialforces to further improve sentiment prediction.

AcknowledgementsWe thank Experience Project for providing us with the sam-ple of moods data and for being encouraging about our re-search. This work was supported in part by ONR grantN00014-13-1-0287 and NSF grant IIS-1016909.

References[1] A. Aji and E. Agichtein. The “nays” have it: Exploring

effects of sentiment in collaborative knowledgesharing. In Proceedings of the NAACL HLT 2010Workshop on Computational Linguistics in a World ofSocial Media, pages 1–2, Los Angeles, California, USA,June 2010. Association for Computational Linguistics.

[2] C. O. Alm, D. Roth, and R. Sproat. Emotions fromtext: Machine learning for text-based emotionprediction. In Proceedings of the Human LanguageTechnology Conference and the Conference onEmpirical Methods in Natural Language Processing(HLT/EMNLP), 2005.

[3] S. Blair-Goldensohn, K. Hannan, R. McDonald,T. Neylon, G. A. Reis, and J. Reynar. Building asentiment summarizer for local service reviews. InWWW Workshop on NLP in the InformationExplosion Era (NLPIX), Beijing, China, 2008.

[4] J. Blitzer, M. Dredze, and F. Pereira. Biographies,bollywood, boom-boxes and blenders: Domainadaptation for sentiment classification. In Proceedingsof the 45th Annual Meeting of the Association ofComputational Linguistics, pages 440–447, Prague,June 2007. Association for Computational Linguistics.

[5] A. Chmiel, J. Sienkiewicz, M. Thelwall, G. Paltoglou,K. Buckley, A. Kappas, and J. A. Ho lyst. Collectiveemotions online and their influence on community life.PLoS ONE, 6(7):e22207, July 2011.

[6] Y. Choi, C. Cardie, E. Riloff, and S. Patwardhan.Identifying sources of opinions with conditionalrandom fields and extraction patterns. In Proceedingsof Human Language Technology Conference andConference on Empirical Methods in Natural LanguageProcessing, pages 355–362, Vancouver, October 2005.Association for Computational Linguistics.

[7] A. Culotta, R. Bekkerman, and A. McCallum.Extracting social networks and contact informationfrom email and the web. In Conference on Email andSpam, 2004.

[8] C. Danescu-Niculescu-Mizil, R. West, D. Jurafsky,J. Leskovec, and C. Potts. No country for oldmembers: User lifecycle and linguistic change in onlinecommunities. In Proceedings of the 22nd InternationalWorld Wide Web Conference, pages 307–317, NewYork, 2013. ACM.

[9] B. Efron and R. Tibshirani. Bootstrap methods forstandard errors, confidence intervals, and othermeasures of statistical accuracy. Statistical Science,1(1):54–57, February 1986.

[10] L. Feldman Barrett and J. A. Russell. Independenceand bipolarity in the structure of affect. Journal ofPersonality and Social Psychology, 74(4):967–984,1998.

[11] A. B. Goldberg and J. Zhu. Seeing stars when therearen’t many stars: Graph-based semi-supervisedleaarning for sentiment categorization. In TextGraphs:HLT/NAACL Workshop on Graph-based Algorithmsfor Natural Language Processing, 2006.

[12] M. Koppel and J. Schler. The importance of neutralexamples for learning sentiment. ComputationalIntelligence, 22(2):100–109, 2006.

[13] J. Lafferty, A. McCallum, and F. Pereira. Conditionalrandom fields : Probabilistic models for segmentingand labeling sequence data. In Proceedings ofICML-01, pages 282–289, 2001.

[14] H. Liu, H. Lieberman, and T. Selker. A model oftextual affect sensing using real-world knowledge. InProceedings of Intelligent User Interfaces (IUI), pages125–132, 2003.

[15] H. Lovheim. A new three-dimensional model foremotions and monoamine neurotransmitters. MedicalHypotheses, 78(2):341–348, 2012.

[16] A. Maas, A. Ng, and C. Potts. Multi-dimensionalsentiment analysis with learned representations. Ms.,Stanford University, 2011.

[17] Y. Mao and G. Lebanon. Isotonic conditional randomfields and local sentiment flow. In B. Scholkopf,J. Platt, and T. Hoffman, editors, Advances in NeuralInformation Processing Systems 19, pages 961–968,Cambridge, MA, 2006. MIT Press.

[18] J. McAuley and J. Leskovec. From amateurs toconnoisseurs: Modeling the evolution of user expertisethrough online reviews. In Proceedings of the 22ndInternational World Wide Web Conference, pages897–907, New York, 2013. ACM.

[19] J. McAuley, J. Leskovec, and D. Jurafsky. Learningattitudes and attributes from multi-aspect reviews. In12th International Conference on Data Mining, pages1020–1025, Washington, D.C., 2012. IEEE ComputerSociety.

[20] M. Miller, C. Sathi, D. Wiesenthal, J. Leskovec, andC. Potts. Sentiment flow through hyperlink networks.In Proceedings of the 5th International AAAIConference on Weblogs and Social Media, Barcelona,Spain, July 2011. Association for the Advancement ofArtificial Intelligence.

[21] L. Muchnik, S. Aral, and S. J. Taylor. Social influencebias: A randomized experiment. Science,341(6146):647–651, 2013.

[22] A. Neviarouskaya, H. Prendinger, and M. Ishizuka.Recognition of affect, judgment, and appreciation intext. In Proceedings of the 23rd InternationalConference on Computational Linguistics (COLING2010), pages 806–814, Beijing, China, August 2010.COLING 2010 Organizing Committee.

[23] N. Okazaki. Crfsuite: a fast implementation ofconditional random fields (crfs), 2007.

[24] B. Pang and L. Lee. A sentimental education:Sentiment analysis using subjectivity summarizationbased on minimum cuts. In Proceedings of the 42ndAnnual Meeting of the Association for ComputationalLinguistics, pages 271–278, Barcelona, Spain, July2004.

[25] B. Pang and L. Lee. Seeing stars: Exploiting classrelationships for sentiment categorization with respectto rating scales. In Proceedings of the 43rd AnnualMeeting of the Association for ComputationalLinguistics, pages 115–124, Ann Arbor, MI, June2005. Association for Computational Linguistics.

[26] B. Pang and L. Lee. Opinion mining and sentimentanalysis. Foundations and Trends in InformationRetrieval, 2(1):1–135, 2008.

[27] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?sentiment classification using machine learningtechniques. In Proceedings of the Conference onEmpirical Methods in Natural Language Processing(EMNLP), pages 79–86, Philadelphia, July 2002.Association for Computational Linguistics.

[28] D. C. Rubin and J. M. Talerico. A comparison ofdimensional models of emotion. Memory,17(8):802–808, 2009.

[29] J. A. Russell. A circumplex model of affect. Journalof Personality and Social Psychology, 39(6):1161–1178,1980.

[30] B. Snyder and R. Barzilay. Multiple aspect rankingusing the Good Grief algorithm. In Proceedings of theJoint Human Language Technology/North AmericanChapter of the ACL Conference (HLT-NAACL), pages300–307, 2007.

[31] R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, andC. D. Manning. Semi-supervised recursiveautoencoders for predicting sentiment distributions. InProceedings of the 2011 Conference on EmpiricalMethods in Natural Language Processing, pages151–161, Edinburgh, Scotland, UK., July 2011. ACL.

[32] C. Sutton and A. McCallum. An introduction toconditional random fields. Foundations and Trends inMachine Learning, 4(4):267–373, 2012.

[33] C. Tan, L. Lee, J. Tang, L. Jiang, M. Zhou, and P. Li.User-level sentiment analysis incorporating socialnetworks. In Proceedings of the 17th ACM SIGKDDInternational Conference on Knowledge Discovery andData Mining, pages 1397–1405, San Diego, CA,August 2011. ACM Digital Library.

[34] D. Watson and A. Tellegen. Toward a consensualstructure of mood. Psychological Bulletin,98(2):219–235, 1985.

[35] J. Wiebe, T. Wilson, and C. Cardie. Annotatingexpressions of opinions and emotions in language.Language Resources and Evaluation, 39(2–3):165–210,2005.

[36] T. Wilson, J. Wiebe, and R. Hwa. Just how mad areyou? Finding strong and weak opinion clauses.Computational Intelligence, 2(22):73–99, 2006.

[37] Y. Yang, C. Chen, and S. Pang. WLPA: A novelalgorithm to detect community structures in socialnetworks. Journal of Computational InformationSystems, 7(2):515–522, 2011.

[38] R. Zafarani, W. Cole, and H. Liu. Sentimentpropagation in social networks: A case study inlivejournal. In S.-K. Chai, J. Salerno, and P. Mabry,editors, Advances in Social Computing, volume 6007 ofLecture Notes in Computer Science, pages 413–420.Springer Berlin / Heidelberg, 2010.

sentiment expression conditioned by affective transitions and social...

Documents