analyzing users’ sentiment towards popular consumer ... · pdf fileanalyzing...
TRANSCRIPT
Analyzing users’ sentiment towards popularconsumer industries and brands on Twitter
Guoning Hu, Preeti Bhargava, Saul Fuhrmann, Sarah Ellinger and Nemanja SpasojevicLithium Technologies | Klout, San Francisco, CA
Email: {guoning.hu, preeti.bhargava, saul.fuhrmann, sarah.ellinger, nemanja.spasojevic}@lithium.com
Abstract—Social media serves as a unified platformfor users to express their thoughts on subjects rangingfrom their daily lives to their opinion on consumerbrands and products. These users wield an enormousinfluence in shaping the opinions of other consumersand influence brand perception, brand loyalty andbrand advocacy. In this paper, we analyze the opinionof 19M Twitter users towards 62 popular industries, en-compassing 12,898 enterprise and consumer brands, aswell as associated subject matter topics, via sentimentanalysis of 330M tweets over a period spanning a month.We find that users tend to be most positive towardsmanufacturing and most negative towards service in-dustries. In addition, they tend to be more positive ornegative when interacting with brands than generallyon Twitter. We also find that sentiment towards brandswithin an industry varies greatly and we demonstratethis using two industries as use cases. In addition, wediscover that there is no strong correlation betweentopic sentiments of different industries, demonstratingthat topic sentiments are highly dependent on thecontext of the industry that they are mentioned in.We demonstrate the value of such an analysis in orderto assess the impact of brands on social media. Wehope that this initial study will prove valuable for bothresearchers and companies in understanding users’ per-ception of industries, brands and associated topics andencourage more research in this field.
I. IntroductionSocial media has become one of the major means of
communication and content production. It serves as aunified platform for users to express their thoughts onsubjects ranging from their daily lives to their opinionon companies and products. This, in turn, has made it avaluable resource for mining user opinion for tasks rangingfrom predicting the performance of movies to results ofstock markets and elections.
Although most people are hesitant to answer surveysabout products or services, they express their thoughtsfreely on social media and wield an enormous influence inshaping the opinions of other consumers. These consumervoices can influence brand perception, brand loyalty andbrand advocacy. As a result, it is imperative that largeenterprises pay more attention to mining user opinionrelated to their brands and products from social media.With social media monitoring, they will be able to tapinto consumer insights to improve their product qual-ity, provide better service, drive sales and even identifynew business opportunities. In addition, they can reducecustomer support costs by responding to their customersthrough these social media channels, as 50% of users prefer
reaching service providers on social media rather than acall center [1].
Sentiment analysis has become synonymous with opin-ion mining and involves the computational study of text inorder to identify sentiment polarity (positive, negative orneutral), intensity and subjectivity [2]. It is an excellenttool for enterprises to analyze users’ expressed opinionson social media without explicitly asking any questions asthis approach often reflects their true opinions. Thoughit has drawbacks regarding the population sampled (as itmay not represent the general public), it can be used toapproximate public opinion.
In this paper, we analyze the opinion of Twitter userstowards different industries, encompassing several globalconsumer brands, and associated subject matter topicsvia sentiment analysis of their tweets. Because twittermessages are short, it is quite challenging to identifythe target that an opinion is expressed towards. As apreliminary study, here we take a special approach basedon the observation that when a user interacts with anindustry or brand, such as a reply or a retweet, thesentiment in the related text likely shows this user’sopinion toward that industry or brand. Therefore, byaggregating such interactions from a large number of users,the sentiment distribution can reveal the general attitudeabout the industry or brand among twitter users.
Our dataset consists of 19M unique users, 62 popularindustries (such as Airlines, Automotive, Telecommunica-tions etc.), encompassing 12, 898 enterprise and consumerbrands1 (such as United, Volkswagen etc.), 8030 subjectmatter topics and about 330M user-brand interactionsover a period spanning a month. Specifically, we try toanswer the following questions:
• What is the opinion of users towards different in-dustries? What are the industries that are viewednegatively or positively?
• Within industries, what is the users’ sentiment to-wards different brands?
• What are the frequent topics that users mention whenthey interact with brands over social media? What istheir sentiment towards those topics?
• How does users’ general behavior on social mediadiffer from their behavior when they interact withbrands?
Our main findings are:
1Brand is here defined as an independently recognizableproduct line or business unit.
arX
iv:1
709.
0743
4v1
[cs
.CL
] 2
1 Se
p 20
17
User brand interaction SentimentThings I have learned today: we own way too many dishes, Negative@comcast is literally the WORST, and moving is the main cause of drinking problems
Beautiful morning flying out of the Bay with @southwestair. Positive#baytola #sfbay #family #fun #flying https://t.co/bnogVW2X8H@united A measure of the harm inflicted by disrupting people’s travel plans: Negativeno one here felt $800 came close to making them whole.
SUPER excited and grateful to be at the #smashbox event Positive@menloparkmall with @sephora! Thanks for ... https://t.co/CMexZF3pLr
TABLE I: Examples of users’ interaction with a brand (highlighted) and the interaction’s sentiment
• There is a positive correlation between users’ generalsentiment and their sentiments toward brands. Userswho generally express one sentiment class (positive,negative or neutral) on social media tend to expressthat same sentiment toward brands.
• Among industries, the most negative industries tendto be those providing services (such as Airlines, Mailand Shipping, and Telecommunications) whereas themost positive industries tend to be those manufactur-ing and selling consumer goods such as Household ap-pliances. We also find that sentiment towards brandswithin an industry varies greatly and we demonstratethis using two industries as use cases.
• Finally, we find that there is no strong correlation be-tween topic sentiments of different industries, demon-strating that topic sentiments are highly dependenton the context of the industry that the topics arementioned in.
To the best of our knowledge, no other work has at-tempted to analyze sentiment of users towards industries,consumer brands and topics at such a large scale.
II. Related WorkSeveral recent papers have focused on analyzing social
media sentiment for a variety of purposes such as publicopinion mining, stock market prediction and prediction ofelection results. In this paper, we do not intend to cover allwork on sentiment analysis of social media exhaustively.Instead, we focus on discussing a few relevant works.
Balasubramanyan et al. [3] found high correlation be-tween public opinion (in terms of consumer confidence andpolitical opinion), measured from polls, with sentimentmeasured from tweets. Hu et al. [4] analyzed public eventsby automatically characterizing segments and event topicsin terms of the aggregate sentiments they elicited onTwitter. Bollen et al. [5] analyzed tweets to find correlationbetween overall public mood and social, economic andother major events. Si et al. [6] used the co-occurrencesof stock tickers in tweets to propagate sentiment of stocksin order to predict stock values. Mittal and Goel [7] minedpublic mood from Twitter and used it along with previousdays’ stock market values to predict future stock marketmovements. Ceron et al. [8] used sentiment analysis ontweets to understand the policy preferences of Italian andFrench citizens. Tumasjan et al. [9] analyzed the role ofpolitical sentiment and mentions of political parties inTweets as indicators of election results.
In contrast to this rich landscape, very few studieshave focused on investigating users’ sentiment towards
major worldwide brands. Adeborna and Siau [10] proposea sentiment topic recognition model based on correlatedtopics models with variational expectation-maximizationalgorithm using airline data from Twitter. They validatetheir approach on a dataset of 1046 tweets concerningthree major airlines by computing their airline qualityrating. Mostafa [11] used a random sample of 3516 tweetsto evaluate consumers’ sentiment towards 16 global well-known brands.
Our work differs in the scale of the Twitter datasetsize as well as the number of industries and brands weanalyze. In addition, we look at users’ sentiment towardstopics. Finally, we derive some meaningful insights fromthis analysis and demonstrate the value of such an analysisin order to assess the impact of brands on social media.
III. Data SetA. Industries and brands dataset
For our purposes, a brand is defined as an independentlyrecognizable product line or business unit, often with itsown logo, trademark, slogan, etc. Examples of brands inour dataset include Cadbury, AAA, and 23andme. A brandmay be synonymous with a given company or business,but we make no attempt to represent company ownershipor hierarchies (e.g. Fanta is considered a separate brandfrom Coca-Cola, although both brands are owned by thecorporate entity The Coca-Cola Company). Brands in ourdata set can contain one or more Twitter profiles andbelong to one or more industries.
An industry is defined as a high-level category of brandsthat perform similar functions or provide similar products.In this analysis, we use a subset of 62 industries drawnfrom LinkedIn Industry Codes2, the Fortune 5003, theForbes Global 20004, and Hoovers5. See Figure 2 for alist of included industries.
The brand dataset was initially bootstrapped by collect-ing companies’ names and industries from the LinkedInprofiles of Klout6 users. We then attempted to match thecompany names to Twitter handles via the Bing SearchAPI. However, this approach had some limitations:
• LinkedIn included many companies that had shutdown, changed names, etc.
2http://developer.linkedin.com/docs/reference/industry-codes
3http://fortune.com/fortune500/list/4http://www.forbes.com/global2000/5http://www.hoovers.com/industry-analysis/industry-
directory.html6https://klout.com
-1 0 1General Sentiment
-1
0
1
Brand
Sen
timen
t
A
all
-1 0 1General Sentiment
-1
0
1
B
automative
-1 0 1General Sentiment
-1
0
1
C
airlines
-1 0 1General Sentiment
-1
0
1
D
fashion
-1 0 1General Sentiment
-1
0
1
E
finance
Fig. 1: Heatmap of users’ brand and general sentiments
0
0.2
0.4
0.6
0.8
1
rad
io
mu
seu
ms-
and
-in
stit
uti
on
s
min
ing
-an
d-m
etal
s
pu
bli
shin
g
edu
cati
on
inte
rnet
con
sum
er-g
oo
ds
reta
il
bea
uty
mu
sic
mar
ket
ing
-an
d-a
dv
erti
sin
g
alco
ho
lic-
bev
erag
es
bu
sin
ess-
serv
ices
fitn
ess
aero
spac
e-an
d-d
efen
se
even
ts
con
stru
ctio
n
hea
vy
-eq
uip
men
t
hea
lth
care
-an
d-m
edic
ine
hea
lth
fash
ion
mo
vie
s-an
d-f
ilm
tech
no
log
y
soft
war
e
no
np
rofi
ts
auto
-par
ts-a
nd
-su
pp
lies
ente
rtai
nm
ent
real
-est
ate
auto
mo
tiv
e
spo
rts
con
sum
er-e
lect
ron
ics
oil
-an
d-e
ner
gy
hig
her
-ed
uca
tio
n
con
glo
mer
ates
per
son
al-c
are
chem
ical
s
ho
use
ho
ld-a
pp
lian
ces
vid
eo-g
ames
go
ver
nm
ent
trav
el
civ
ic-a
nd
-so
cial
-org
aniz
atio
ns
ho
spit
alit
y
new
spap
ers-
and
-mag
azin
es
med
ia
ph
arm
aceu
tica
ls
com
pu
ter-
har
dw
are
spo
rts-
med
ia
fin
anci
al-s
erv
ices
insu
ran
ce
spec
ialt
y-s
tore
s
tele
vis
ion
uti
liti
es
rest
aura
nts
foo
d-a
nd
-bev
erag
es
po
liti
cs-a
nd
-po
licy
ban
kin
g
sup
erm
ark
ets-
and
-fo
od
-ret
ail
tran
spo
rtat
ion
con
sum
er-s
erv
ices
tele
com
mu
nic
atio
ns
mai
l-an
d-s
hip
pin
g
airl
ines
Cross Industry Content Sentiment (Deep Learning Based)
PositiveNeutral
Negative
Fig. 2: Sentiment distribution by industry, ordered by negative sentiment (best viewed in color)
0
0.2
0.4
0.6
0.8
1
poli
tics
-and-p
oli
cy
tran
sport
atio
n
radio
tele
vis
ion
mai
l-an
d-s
hip
pin
g
tele
com
munic
atio
ns
gover
nm
ent
airl
ines
new
spap
ers-
and-m
agaz
ines
med
ia
consu
mer
-ser
vic
es
rest
aura
nts
soft
war
e
sport
s-m
edia
conglo
mer
ates
min
ing-a
nd-m
etal
s
ban
kin
g
finan
cial
-ser
vic
es
real
-est
ate
chem
ical
s
busi
nes
s-se
rvic
es
mar
ket
ing-a
nd-a
dver
tisi
ng
civic
-and-s
oci
al-o
rgan
izat
ions
oil
-and-e
ner
gy
phar
mac
euti
cals
com
pute
r-har
dw
are
uti
liti
es
tech
nolo
gy
nonpro
fits
spec
ialt
y-s
tore
s
vid
eo-g
ames
super
mar
ket
s-an
d-f
ood-r
etai
l
insu
rance
fitn
ess
sport
s
hig
her
-educa
tion
reta
il
hea
lth
aero
spac
e-an
d-d
efen
se
inte
rnet
ente
rtai
nm
ent
educa
tion
consu
mer
-ele
ctro
nic
s
hea
lthca
re-a
nd-m
edic
ine
const
ruct
ion
auto
moti
ve
even
ts
hea
vy-e
quip
men
t
musi
c
food-a
nd-b
ever
ages
muse
um
s-an
d-i
nst
ituti
ons
fash
ion
movie
s-an
d-f
ilm
hosp
ital
ity
trav
el
per
sonal
-car
e
auto
-par
ts-a
nd-s
uppli
es
bea
uty
publi
shin
g
alco
holi
c-bev
erag
es
consu
mer
-goods
house
hold
-appli
ance
s
Cross Industry Content Sentiment (Deep Learning Based)
PositiveNeutral
Negative
Fig. 3: Sentiment distribution by industry, ordered by positive sentiment (best viewed in color)
• Even with the industry name appended, the BingSearch API often returned non-brand Twitter ac-counts with similar names to the target brands.
• Bing Search API provided only one Twitter handleper brand, whereas many brands have several spe-cialized handles (for customer service vs. corporatecommunications, in different languages, etc.).
Thus, in order to guarantee the quality of the data, wechose to manually curate the brand dataset. Candidatehandles were collected both from the initial bootstrap dataand by mining the Twitter Firehose stream7, looking atsignals such as reply volume and average time to reply(which can indicate a handle staffed by professional socialmedia agents), follower and mention volume, and the use ofenterprise-level publishing and reply tools. Each candidatehandle was reviewed to determine if it belonged to a brand.If yes, it was assigned a new or existing brand node and
7http://support.gnip.com/apis/firehose/overview.html
given one or more industry labels. At this time, the branddata set contains 12,898 brands and 16,137 handles.
B. Twitter datasetFor each brand, we collected interactions (including
replies to tweets, mentions, retweets, and cc) of Twitterusers with the brand for the month of April 2017, viathe Twitter Firehose stream. Each tweet was processedthrough the Lithium NLP pipeline [12] to extract infor-mation such as the tweet language, named entities etc.For each Twitter user in our dataset, we collected all theirEnglish tweets for April 2017.
The complete dataset contains more than 19M uniqueusers that interacted with the 12, 898 brands. Theseusers generated more than 330M user-brand interactions.Among these interactions, there are more than 56Munique tweets in English. Overall, these users postedabout 282M unique tweets in English. Table I shows someexamples of user brand interactions in our dataset alongwith their sentiment.
0
0.2
0.4
0.6
0.8
1ge
-avi
atio
nai
r-fra
nce-
klm
croa
tia-a
irlin
esol
ympi
c-ai
rch
ina-
sout
hern
-airl
ines
all-n
ippo
n-ai
rway
sci
tilin
kae
rolin
eas-
arge
ntin
asai
rbal
ticko
rean
-air
japa
n-ai
rline
sgo
l-lin
has-
aere
asab
-airl
ines
air-m
aurit
ius
thom
as-c
ook-
airli
nes
flyna
sha
inan
-airl
ines
sun-
coun
try-a
irlin
esae
nam
ango
-airl
ines
fastj
etai
r-new
-zea
land
oman
-air
cond
or-a
irlin
esau
stria
n-ai
rline
sga
ruda
-indo
nesia
keny
a-ai
rway
ssin
gapo
re-a
irlin
esvo
laris
skyw
est-a
irlin
esfin
nair
icel
anda
irsw
iss-in
tern
atio
nal-a
ir-lin
esro
yal-j
orda
nian
mal
aysia
-airl
ines
turk
ish-a
irlin
esqa
tar-a
irway
sha
wai
ian-
airli
nes
air-e
urop
aje
t2sp
icej
et-li
mite
dm
iddl
e-ea
st-ai
rline
svi
vaae
robu
sai
rasia
atla
s-ai
r-wor
ldw
ide
mon
arch
-airl
ines
air-a
rabi
aet
ihad
-airw
ays
virg
in-a
ustra
liabr
itish
-mid
land
-inte
rnat
iona
lre
publ
ic-a
irway
sem
irate
sbr
usse
ls-ai
rline
sai
r-tra
nsat
-a-t-
inc
cath
ay-p
acifi
ctra
nsav
iala
tam
-airl
ines
scan
dina
vian
-airl
ines
cebu
-pac
ific-
air
tap-
portu
gal
saud
i airli
nes
skyt
eam
air-c
hina
virg
in-a
tlant
ic-a
irway
sph
ilipp
ine-
airli
nes
lufth
ansa
inte
rjet
copa
-airl
ines
el-a
l-airl
ines
porte
rairl
ines
alas
ka-a
irlin
esci
tyje
tkl
mqa
ntas
-airw
ays
wes
tjet
norw
egia
n-ai
r-shu
ttle
avia
nca
air-i
ndia
briti
sh-a
irway
svi
rgin
-am
eric
aae
r-lin
gus
air-f
ranc
eib
eria
indi
go-a
irlin
esje
tblu
eex
pres
sjet
sout
h-af
rican
-airw
ays
wow
-air
ryan
air
sout
hwes
t-airl
ines
vuel
ing-
airli
nes
flybe
aero
mex
ico
star-a
llian
ceje
t-airw
ays
air-c
anad
ago
air
us-a
irway
sea
syje
tuk
rain
e-in
tern
atio
nal-a
irlin
esam
eric
an-a
irlin
esai
rber
linal
legi
ant-a
irfro
ntie
r-airl
ines
unite
d-ai
rline
sde
ltasp
irit-a
irlin
es
Cross Airline Industry Content Sentiment (Deep Learning Based)
PositiveNeutral
Negative
Fig. 4: Sentiment distribution for Airlines, ordered by negative sentiment (best viewed in color)
0
0.2
0.4
0.6
0.8
1
bri
tish
-car
-auct
ions
carf
axbyd
onst
arta
trav
elce
nte
rsdai
mle
rgro
upe-
psa
fiat
-chry
sler
-auto
mobil
esnat
ional
-car
-ren
tal
gen
eral
-moto
rsle
xus
tesl
ala
nci
afo
rdds-
auto
mobil
esduca
ti-m
oto
rgm
cau
totr
ader
volk
swag
enfc
a-fi
at-c
hry
sler
-auto
mobil
esci
troen
hold
ench
rysl
erta
ta-m
oto
rsw
inneb
ago
audi
fuji
-hea
vy-i
ndust
ries
renau
ltau
tonat
ion
alfa
-rom
eosk
oda
toyota
seat
chev
role
tnis
san
lotu
s-ca
rshyundai
jeep
mit
subis
hi-
moto
rskaw
asak
ihar
ley-d
avid
son
dac
iaen
terp
rise
-hold
ings
chev
y-e
lect
ric
auto
car
mer
cedes
-ben
zvolv
o-t
ruck
stu
roal
jom
aih-a
uto
moti
ve-
com
pan
yvau
xhal
l-m
oto
rsca
dil
lac
toyota
-moto
r-nort
h-a
mer
ica
mah
indra
-and-m
ahin
dra
acura
honda
fiat
abar
th-u
kbuic
kkia
-moto
rsin
finit
isu
bar
u-u
sapors
che
ford
-must
ang
her
o-m
oto
corp
safe
lite
edm
unds
suzu
ki-
moto
rspeu
geo
tm
ini
dodge
yam
aha
isuzu
-moto
rsvolv
ois
uzu
jaguar
landro
ver
mas
erat
im
azda-
moto
rbm
w-i
ferr
ari
opel
carm
axben
tley
-moto
rstr
ium
ph-m
oto
rcycl
esbm
w-m
oto
rrad
bm
wla
mborg
hin
iro
ber
t-fo
rres
ter
asto
n-m
arti
nm
arsh
all-
moto
r-gro
up
Cross Automotive Industry Content Sentiment (Deep Learning Based)
PositiveNeutral
Negative
Fig. 5: Sentiment distribution for Automotive, ordered by positive sentiment (best viewed in color)
C. Topics datasetEach tweet in our dataset is associated with subject
matter topics that are most relevant to the tweet text. Forinstance, the tweet “Anti-net neutrality bots are swarmingthe FCC‘s comments http://engt.co/2q4Kd7l” gets asso-ciated with topics such as Net Neutrality, Federal Commu-nications Commission and Telecommunications etc. Thesetopics belong to the Klout Topic Ontology8 (KTO) [13]- a manually curated ontology built to capture socialmedia users’ interests and expertise scores, in differenttopics, across multiple social networks. As of April 2017, itconsists of roughly 8,030 topic nodes and 13,441 edges en-coding hierarchical relationships among them. On Klout,we use these topics to model users’ interests [14] andexpertise [15] by building their topical profiles. These usertopics are also available via the GNIP PowerTrack API9.The topics are inferred by mapping the recognized
entities in a tweet to KTO topics via the Lithium NLPpipeline [12] which uses a weighted ensemble of severalsemi-supervised learning models that employ entity co-
8http://bit.ly/2fcg3wi9http://support.gnip.com/enrichments/klout.html
occurrences, word embeddings, Freebase hierarchical re-lationships and Wikipedia.
IV. Method
To analyze the sentiment of a tweet and label it asone of the 3 sentiment classes (‘Positive’, ‘Negative’ or‘Neutral’), we used a Deep Learning based approach asthis field has recently enabled strong advances in NLP [16].In particular, we use Recurrent Neural Networks (RNNs)which are a class of neural networks that are able to modelrelationships in time. Unlike traditional neural networks,RNNs use units with internal states that can persist infor-mation about previous events. This makes them suitablefor problems that require sequential information, suchas text processing tasks. We also experimented with alexicon based approach but found deep learning to be moreaccurate given empirical evidence ([17], [18]).
In our implementation, we use a conventional LongShort Term Memory (LSTM) network [17]. We first pre-process the text by tokenizing and normalizing it. We thenfeed this text to the LSTM, and use the labels to performsupervised learning on the messages. We use a vocabularyof 40k words, 128 embeddings units, 32 units, and a
100
1000
10000
100000
1x106
heav
y-eq
uipm
ent
min
ing-
and-
met
als
cons
truct
ion
utili
ties
chem
ical
sph
arm
aceu
tical
sin
sura
nce
mar
ketin
g-an
d-ad
verti
sing
real
-est
ate
hosp
italit
yau
to-p
arts
-and
-sup
plie
she
alth
care
-and
-med
icin
epe
rson
al-c
are
tech
nolo
gyho
useh
old-
appl
ianc
esoi
l-and
-ene
rgy
busi
ness
-ser
vice
sm
ail-a
nd-s
hipp
ing
finan
cial
-ser
vice
she
alth
bank
ing
cong
lom
erat
estra
vel
cons
umer
-ser
vice
sed
ucat
ion
auto
mot
ive
com
pute
r-ha
rdw
are
beau
tysu
perm
arke
ts-a
nd-f
ood-
reta
ilso
ftwar
efa
shio
nco
nsum
er-g
oods
tele
com
mun
icat
ions
publ
ishi
nghi
gher
-edu
catio
nsp
ecia
lty-s
tore
sfit
ness
food
-and
-bev
erag
esco
nsum
er-e
lect
roni
cstra
nspo
rtatio
nre
tail
mus
eum
s-an
d-in
stitu
tions
civi
c-an
d-so
cial
-org
aniz
atio
nsm
ovie
s-an
d-fil
mre
stau
rant
sno
npro
fits
even
tsal
coho
lic-b
ever
ages
vide
o-ga
mes
mus
icai
rline
sae
rosp
ace-
and-
defe
nse
gove
rnm
ent
polit
ics-
and-
polic
yen
terta
inm
ent
tele
visi
onm
edia
new
spap
ers-
and-
mag
azin
essp
orts
radi
osp
orts
-med
iain
tern
et
Fig. 6: Average User-Brand Interactions in Each Industry
Fig. 7: Word cloud of topic sentiments across all industrieswith color representing the sentiment (red is negative,green is positive) and the shade representing its intensity
dropout of 0.5. Our training dataset consists of about 14Mmessage sentiment labels that come from a brand agentmanual assignment dataset [18]. The trained model out-puts a probability distribution over the 3 sentiment classesfor a test sample. To compute a final sentiment score, weuse a simple approximation of taking the difference of thepositive and negative sentiment probabilities.
V. ResultsWe categorized the results obtained from this analysis
into three main categories:
A. Brand sentiment vs general sentimentWe first examined whether users showed different sen-
timents when interacting with brands and otherwise onTwitter. In particular, we calculated two sentiment scores(on a scale of [-1,+1]) for each user in our dataset:
• Brand sentiment: This is the average sentiment ofunique tweets posted by this user during his/her
interactions with brands. Here, we calculated thesentiment score for all brands, as well as scores forindividual industries. Average sentiment is computedby taking the average of the difference of positive andnegative sentiment.
• General sentiment: Average sentiment of all uniquetweets posted by this user (including interactions witha brand).
Figure 1A shows the two-dimensional heatmap of thesetwo scores for all the 19M users of our dataset in the62 industries, where the brighter color indicates higherdensity of users. We also break down this heatmap fordifferent industries. Figures 1B, 1C, 1D, and 1E show thehistograms for Automotive, Airlines, Fashion and Financeindustries respectively. Some important observations illus-trated in these figures are:
• Both the brand and general sentiment scores arecentered around 0.2.
• General sentiment scores are more closely centeredaround 0.2, whereas brand sentiment scores spreadsmore towards 1 (positive) and -1 (negative). Thisshows that users tend to be more positive or negativethan usual when interacting with brands.
• There is positive correlation (as computed by Pear-son’s correlation coefficient), 0.7, between these twoscores for all brands. This shows that users who gen-erally express one sentiment class (positive, negativeor neutral) in their tweets tend to express that samesentiment toward brands.
• The diagonal in each figure corresponds to userswhose brand sentiment score and general sentimentscore are almost the same. This represents a signifi-cant proportion of all users (about 3.5%) who do notpost much other than when interacting with brands.
B. Industry and brand sentimentWe then examined sentiment distributions across dif-
ferent industries and brands. To do so, we first classifiedeach tweet as positive, negative, or neutral based on thehighest sentiment probability computed by our LSTMmodel. We then calculated the percentage of positive,negative, and neutral tweets among all unique tweets foreach industry and brand. Figure 2 shows the percentsentiment distribution of individual industries, sorted by
airlines
transpo
rtatio
nmail-a
nd-shipp
ing
consum
er-services
telecommun
ications
politics-and-po
licy
chem
icals
financial-services
pharmaceuticals
utilities
real-estate
civic-and-social-organizations
superm
arkets-and
-foo
d-retail
insurance
government
oil-a
nd-energy
marketin
g-and-advertising
bank
ing
cong
lomerates
person
al-care
newspapers-and-magazines
techno
logy
television
high
er-edu
catio
nradio
internet
business-services
food
-and
-beverages
video-games
media
nonp
rofits
heavy-equipm
ent
softw
are
restaurants
alcoho
lic-beverages
compu
ter-hardware
retail
health
household-appliances
healthcare-and
-medicine
fitness
mining-and-metals
auto-parts-and
-sup
plies
hospitality
specialty
-stores
automotive
education
constru
ction
consum
er-electronics
aerospace-and-defense
fashion
publishing
beauty
travel
sports
enterta
inment
consum
er-goo
dssports-m
edia
events
mov
ies-and-film
museums-and-institu
tions
music
Fox NewsDonald Trump
CNNVenezuela
SyriaFifth Harmony
RussiaThe New York Times
PortugalUnited States of America
TelevisionYouTube
SoccerCountry Music
TwitterWWE
Sports and RecreationBasketball
MusicVideo Games
−0.50
−0.25
0.00
0.25
0.50
0.75
1.00
Fig. 8: Topic Sentiment across different industries (best viewed in color)
negative sentiment. Figure 3 shows the same data exceptthat industries are sorted by positive sentiment.
Clearly, sentiment distributions vary significantly acrossdifferent industries. Interestingly, the most negative indus-tries tend to be those providing services to customers, suchas Airlines, Mail and Shipping, and Telecommunications,whereas the most positive industries tend to be those man-ufacturing and selling consumer goods, such as Householdappliances. Additionally, the most polarized industries aresurprisingly not Politics and Sports (as one might expect).We also analyzed the sentiment within specific indus-
tries10, taking Airlines (which has the highest negativesentiment) and Automotive (which has a high positivesentiment) as use cases. Figure 4 shows the sentimentdistribution of individual brands in the airline industry,sorted by negative sentiment. Clearly, sentiment distribu-tion varies significantly across different brands. As shown,users are quite negative towards Spirit, Delta, and United,while quite positive towards Air-Mauritius, Sun-Country,and Thomas Cook.
Figure 5 show the sentiment distribution of individualbrands in the Automotive industry. Again, we observesignificant variation across different brands. The mostpositively viewed brands include luxury brands such asAston Martin, Lamborghini and BMW, while the most neg-atively viewed brands include rental car companies (suchas Enterprise and National), Hyundai and Volkswagen.
In addition, we noticed that brands with a higher vol-ume of interactions tend to have more neutral sentiment.As an illustration, Figure 6 shows the average user-brandinteractions across industries. Among the top 10 industrieswith the most interactions, 4 of them are among the top 10industries with the most neutral sentiment, and 8 of themare among either the bottom 10 industries with the mostpositive or negative sentiment. We suspect this is because
10Due to lack of space, it is not possible to show sentimenttowards brands in all the 62 industries
these industries get a lot of interactions and coverage frommedia, whose messages generally tend to be neutral.
C. Topic sentimentWe also analyzed the sentiment towards topics associ-
ated with users’ brand interaction tweets. Here we assumethat when users interact with brands belonging to anindustry, they will mention certain topics in the tweet.By assigning the tweet’s sentiment score to these topics,we can roughly estimate the average sentiment of thosetopic within those industries. Figure 7 shows the wordcloud representing the average sentiment of the 100 mostfrequent topics in users’ interactions across all industries.As shown, highly negative topics include Fox News whilehighly positive topics include Video Games and Music.Figure 8 shows the average sentiment of the 20 most
frequent topics (in users’ interactions across all industries)within each industry, sorted by the average sentiment.It depicts a weak correlation between industry and topicsentiment. But there are many exceptions to this, such astopic CNN in industry Heavy Equipment and topic Russiain industry Entertainment. Moreover, average sentimentstowards these topics vary significantly across differentindustries and are highly dependent on the context of theindustry that the topics are mentioned in. For instance,from Figure 7 we can infer that Fox News is a highlynegative topic and we might expect it to be negative acrossall industries. However, we can see in Figure 8 that thistopic actually has positive sentiment in industries such asEvents, Music and Sports.Figure 9 shows the correlation between industries rep-
resented by their topic sentiment vectors. Each industryis represented as a 100 dimensional vector of the mostfrequent topics (those shown in Figure 7) and the value ofeach dimension represents the average sentiment towardsthat topic within that industry. We centered these sen-timent values around zero in order to mitigate any biasin the data. We then computed cosine similarity between
aerosp
ace-an
d-de
fens
eairline
salco
holic-bev
erag
esau
to-parts-and
-sup
plies
automotive
bank
ing
beau
tybu
sine
ss-service
sch
emicals
civic-an
d-so
cial-organ
izations
compu
ter-ha
rdware
cong
lomerates
cons
truction
cons
umer-electronics
cons
umer-goo
dsco
nsum
er-service
sed
ucation
enterta
inmen
tev
ents
fash
ion
finan
cial-service
sfitne
ssfood
-and
-bev
erag
esgo
vernmen
the
alth
healthca
re-and
-med
icine
heav
y-eq
uipm
ent
high
er-edu
catio
nho
spita
lity
hous
ehold-ap
plianc
esinsu
ranc
einternet
mail-a
nd-shipp
ing
marke
ting-an
d-ad
vertising
med
iamining-an
d-metals
mov
ies-an
d-film
mus
eums-an
d-institu
tions
mus
icne
wsp
apers-an
d-mag
azines
nonp
rofits
oil-a
nd-ene
rgy
person
al-care
pharmac
eutic
als
politics-an
d-po
licy
publishing
radio
real-estate
restau
rants
retail
softw
are
spec
ialty
-stores
sports
sports-m
edia
supe
rmarke
ts-and
-food
-retail
tech
nology
teleco
mmun
ications
television
trans
porta
tion
trave
lutilitie
svide
o-ga
mes
aerospace-and-defenseairlines
alcoholic-beveragesauto-parts-and-supplies
automotivebankingbeauty
business-serviceschemicals
civic-and-social-organizationscomputer-hardware
conglomeratesconstruction
consumer-electronicsconsumer-goods
consumer-serviceseducation
entertainmenteventsfashion
financial-servicesfitness
food-and-beveragesgovernment
healthhealthcare-and-medicine
heavy-equipmenthigher-education
hospitalityhousehold-appliances
insuranceinternet
mail-and-shippingmarketing-and-advertising
mediamining-and-metals
movies-and-filmmuseums-and-institutions
musicnewspapers-and-magazines
nonprofitsoil-and-energypersonal-care
pharmaceuticalspolitics-and-policy
publishingradio
real-estaterestaurants
retailsoftware
specialty-storessports
sports-mediasupermarkets-and-food-retail
technologytelecommunications
televisiontransportation
travelutilities
video-games
−0.4
−0.2
0.0
0.2
0.4
0.6
0.8
1.0
Fig. 9: Relationship between Industries represented by topic sentiment vectors (best viewed in color)
the vectors. Overall, there is no strong correlation betweentopic sentiments of different industries, demonstratingthat topic sentiments are highly industry-dependent.
VI. LimitationsWhile we have attempted to cover as much ground as
possible with respect to industries, brands and topics aswell as users’ interactions and sentiment towards them, wenote that this study has some limitations:
• We limit our dataset to Twitter alone since it is oneof the most widely used platforms for self expression.
• Our work mainly focuses on the most popular indus-try and brands.
• Due to resource constraints, we analyzed Twitter datafor only a month.
• The popular topics within industries are often basedon current context and can change with time.
• This study analyzed sentiment only at the messagelevel. Accurate sentiment analysis at the aspect levelwill lead to more insightful results.
VII. ApplicationsAs mentioned earlier, these insights about users’ senti-
ment towards consumer brands can help brands provide
better online customer service, assess the impact of socialmedia marketing campaigns or public relations incidents,identify perceived flaws in their products, and even iden-tify new sales or business opportunities. They also helpbrands gauge the consumers’ perception of them.
Another direct application of analyzing consumers’ sen-timent is to rank a brand relative to its competitors,allowing that brand to contextualize its performance onsocial media. This is easily demonstrated by the SocialImpact Ranking tool11 that Lithium has developed andwhich employs the sentiment of users’ tweets, in additionto several other metrics, to assess the impact of differentcompeting brands in various industries on social media. Asan illustration, Figures 10 and 11 show screenshots of theimpact ranking of airlines on different dates separated by3 weeks. Figure 10 coincides with an unfortunate incidentinvolving a passenger on United Airlines12As evident,United has the highest mentions as well as strong negativesentiment towards it on Twitter. However, the number ofmentions have decreased 3 weeks later (Figure 11).
11https://klout.com/impact-ranking/12http://bit.ly/2wDUR6u
Fig. 10: Airlines ranking retrieved on April 24, 2017 Fig. 11: Airlines ranking retrieved on May 15, 2017
Brands are very interested in understanding customers’attitudes towards them. There have been many demandson quantitative measures of overall attitude of users to-wards a brand as well as their underlying business com-ponents, such as product, website, support and customerservice. Sentiment analysis of brands, topics and theirrelationship plays a key role in building such measures.
VIII. Conclusion and Future WorkIn this paper, we analyzed the opinion of 19M Twitter
users towards 62 popular industries, encompassing 12,898enterprise and consumer brands, as well as associatedsubject matter topics via sentiment analysis of 330Mtweets over a period spanning a month. To the best ofour knowledge, no other work has attempted to analyzesentiment of users towards industries, consumer brandsand topics at such a large scale. We found that userstend to be most positive towards manufacturing and mostnegative towards service industries. In addition, they tendto be more positive or negative when interacting withbrands than generally on Twitter. We also found that sen-timent towards brands within an industry varies greatly.In addition, we discovered that there is no strong cor-relation between topic sentiments of different industries,demonstrating that topic sentiments are highly dependenton the context of the industry that they are mentionedin. We also demonstrated the value of such an analysis inorder to assess the impact of brands on social media usingseveral metrics.
This preliminary study acts as a first step towardsunderstanding the users’ perception of industries, brandsand associated topics. We hope that our analysis andthe insights that we derive from it will prove valuablefor both researchers and companies in understanding userbehavior and encourage more research in this field. Infuture, we will work on employing these datasets resultsfor sentiment forecasting. We also plan to conduct surveysamong Twitter users in order to validate the insights wehave derived from our analysis.
References[1] A. Okeleke and S. Bali, “Integrating social media into
CRM for next generation customer experience,” Ovum,Tech. Rep., 05 2014.
[2] B. Pang, L. Lee et al., “Opinion mining and sentimentanalysis,” Foundations and Trends® in Information Re-trieval, vol. 2, no. 1–2, pp. 1–135, 2008.
[3] R. Balasubramanyan, B. R. Routledge, and N. A. Smith,“From tweets to polls: Linking text sentiment to publicopinion time series.” ICWSM, pp. 122 – 129, 2010.
[4] Y. Hu, F. Wang, and S. Kambhampati, “Listening tothe crowd: Automated analysis of events via aggregatedtwitter sentiment.” in IJCAI, 2013.
[5] J. Bollen, H. Mao, and A. Pepe, “Modeling public moodand emotion: Twitter sentiment and socio-economic phe-nomena.” ICWSM, vol. 11, pp. 450–453, 2011.
[6] J. Si, A. Mukherjee, B. Liu, S. J. Pan, Q. Li, and H. Li,“Exploiting social relations and sentiment for stock pre-diction.” in EMNLP, vol. 14, 2014, pp. 1139–1145.
[7] A. Mittal and A. Goel, “Stock prediction usingtwitter sentiment analysis,” Standford University,CS229 (http://cs229.stanford.edu/proj2011/GoelMittal-StockMarketPredictionUsingTwitterSentimentAnalysis.pdf).
[8] A. Ceron, L. Curini, S. M. Iacus, and G. Porro, “Everytweet counts? how sentiment analysis of social media canimprove our knowledge of citizens’ political preferenceswith an application to italy and france,” New Media &Society, vol. 16, no. 2, pp. 340–358, 2014.
[9] A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M.Welpe, “Predicting elections with twitter: What 140 char-acters reveal about political sentiment.” ICWSM, vol. 10,no. 1, pp. 178–185, 2010.
[10] E. Adeborna and K. Siau, “An approach to sentimentanalysis-the case of airline quality rating.” in PACIS, 2014,p. 363.
[11] M. M. Mostafa, “More than words: Social networks’ textmining for consumer brand sentiments,” Expert Systemswith Applications, vol. 40, no. 10, pp. 4241–4251, 2013.
[12] P. Bhargava, N. Spasojevic, and G. Hu, “Lithium nlp: Asystem for rich information extraction from noisy user gen-erated text on social media,” in Proc. of the 3rd Workshopon Noisy User-generated Text, 2017, pp. 131–139.
[13] S. Ellinger, P. Bhattacharyya, P. Bhargava, and N. Spa-sojevic, “Klout topics for modeling interests and expertiseof users across social networks,” in under review.
[14] N. Spasojevic, J. Yan, A. Rao, and P. Bhattacharyya,“Lasta: Large scale topic assignment on multiple socialnetworks,” in Proc. of ACM KDD, 2014, pp. 1809 –1818.
[15] N. Spasojevic, P. Bhattacharyya, and A. Rao, “Mining halfa billion topical experts across multiple social networks,”Social Network Analysis and Mining, vol. 6, no. 1, pp. 1–14, 2016.
[16] T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recenttrends in deep learning based natural language process-ing,” in arXiv preprint arXiv:1708.02709, 2017.
[17] A. Rao and N. Spasojevic, “Actionable and political textclassification using word embeddings and lstm,” arXivpreprint arXiv:1607.02501, 2016.
[18] N. Spasojevic and A. Rao, “Identifying actionable mes-sages on social media,” in IEEE International Conferenceon Big Data, ser. IEEE BigData ’15, 2015.