analyzing users’ sentiment towards popular consumer ... · pdf fileanalyzing...

Analyzing users’ sentiment towards popularconsumer industries and brands on Twitter

Guoning Hu, Preeti Bhargava, Saul Fuhrmann, Sarah Ellinger and Nemanja SpasojevicLithium Technologies | Klout, San Francisco, CA

Email: {guoning.hu, preeti.bhargava, saul.fuhrmann, sarah.ellinger, nemanja.spasojevic}@lithium.com

Abstract—Social media serves as a unified platformfor users to express their thoughts on subjects rangingfrom their daily lives to their opinion on consumerbrands and products. These users wield an enormousinfluence in shaping the opinions of other consumersand influence brand perception, brand loyalty andbrand advocacy. In this paper, we analyze the opinionof 19M Twitter users towards 62 popular industries, en-compassing 12,898 enterprise and consumer brands, aswell as associated subject matter topics, via sentimentanalysis of 330M tweets over a period spanning a month.We find that users tend to be most positive towardsmanufacturing and most negative towards service in-dustries. In addition, they tend to be more positive ornegative when interacting with brands than generallyon Twitter. We also find that sentiment towards brandswithin an industry varies greatly and we demonstratethis using two industries as use cases. In addition, wediscover that there is no strong correlation betweentopic sentiments of different industries, demonstratingthat topic sentiments are highly dependent on thecontext of the industry that they are mentioned in.We demonstrate the value of such an analysis in orderto assess the impact of brands on social media. Wehope that this initial study will prove valuable for bothresearchers and companies in understanding users’ per-ception of industries, brands and associated topics andencourage more research in this field.

I. IntroductionSocial media has become one of the major means of

communication and content production. It serves as aunified platform for users to express their thoughts onsubjects ranging from their daily lives to their opinionon companies and products. This, in turn, has made it avaluable resource for mining user opinion for tasks rangingfrom predicting the performance of movies to results ofstock markets and elections.

Although most people are hesitant to answer surveysabout products or services, they express their thoughtsfreely on social media and wield an enormous influence inshaping the opinions of other consumers. These consumervoices can influence brand perception, brand loyalty andbrand advocacy. As a result, it is imperative that largeenterprises pay more attention to mining user opinionrelated to their brands and products from social media.With social media monitoring, they will be able to tapinto consumer insights to improve their product qual-ity, provide better service, drive sales and even identifynew business opportunities. In addition, they can reducecustomer support costs by responding to their customersthrough these social media channels, as 50% of users prefer

reaching service providers on social media rather than acall center [1].

Sentiment analysis has become synonymous with opin-ion mining and involves the computational study of text inorder to identify sentiment polarity (positive, negative orneutral), intensity and subjectivity [2]. It is an excellenttool for enterprises to analyze users’ expressed opinionson social media without explicitly asking any questions asthis approach often reflects their true opinions. Thoughit has drawbacks regarding the population sampled (as itmay not represent the general public), it can be used toapproximate public opinion.

In this paper, we analyze the opinion of Twitter userstowards different industries, encompassing several globalconsumer brands, and associated subject matter topicsvia sentiment analysis of their tweets. Because twittermessages are short, it is quite challenging to identifythe target that an opinion is expressed towards. As apreliminary study, here we take a special approach basedon the observation that when a user interacts with anindustry or brand, such as a reply or a retweet, thesentiment in the related text likely shows this user’sopinion toward that industry or brand. Therefore, byaggregating such interactions from a large number of users,the sentiment distribution can reveal the general attitudeabout the industry or brand among twitter users.

Our dataset consists of 19M unique users, 62 popularindustries (such as Airlines, Automotive, Telecommunica-tions etc.), encompassing 12, 898 enterprise and consumerbrands1 (such as United, Volkswagen etc.), 8030 subjectmatter topics and about 330M user-brand interactionsover a period spanning a month. Specifically, we try toanswer the following questions:

• What is the opinion of users towards different in-dustries? What are the industries that are viewednegatively or positively?

• Within industries, what is the users’ sentiment to-wards different brands?

• What are the frequent topics that users mention whenthey interact with brands over social media? What istheir sentiment towards those topics?

• How does users’ general behavior on social mediadiffer from their behavior when they interact withbrands?

Our main findings are:

1Brand is here defined as an independently recognizableproduct line or business unit.

arX

iv:1

709.

0743

4v1

[cs

.CL

] 2

1 Se

p 20

17

User brand interaction SentimentThings I have learned today: we own way too many dishes, Negative@comcast is literally the WORST, and moving is the main cause of drinking problems

Beautiful morning flying out of the Bay with @southwestair. Positive#baytola #sfbay #family #fun #flying https://t.co/bnogVW2X8H@united A measure of the harm inflicted by disrupting people’s travel plans: Negativeno one here felt $800 came close to making them whole.

SUPER excited and grateful to be at the #smashbox event Positive@menloparkmall with @sephora! Thanks for ... https://t.co/CMexZF3pLr

TABLE I: Examples of users’ interaction with a brand (highlighted) and the interaction’s sentiment

• There is a positive correlation between users’ generalsentiment and their sentiments toward brands. Userswho generally express one sentiment class (positive,negative or neutral) on social media tend to expressthat same sentiment toward brands.

• Among industries, the most negative industries tendto be those providing services (such as Airlines, Mailand Shipping, and Telecommunications) whereas themost positive industries tend to be those manufactur-ing and selling consumer goods such as Household ap-pliances. We also find that sentiment towards brandswithin an industry varies greatly and we demonstratethis using two industries as use cases.

• Finally, we find that there is no strong correlation be-tween topic sentiments of different industries, demon-strating that topic sentiments are highly dependenton the context of the industry that the topics arementioned in.

To the best of our knowledge, no other work has at-tempted to analyze sentiment of users towards industries,consumer brands and topics at such a large scale.

II. Related WorkSeveral recent papers have focused on analyzing social

media sentiment for a variety of purposes such as publicopinion mining, stock market prediction and prediction ofelection results. In this paper, we do not intend to cover allwork on sentiment analysis of social media exhaustively.Instead, we focus on discussing a few relevant works.

Balasubramanyan et al. [3] found high correlation be-tween public opinion (in terms of consumer confidence andpolitical opinion), measured from polls, with sentimentmeasured from tweets. Hu et al. [4] analyzed public eventsby automatically characterizing segments and event topicsin terms of the aggregate sentiments they elicited onTwitter. Bollen et al. [5] analyzed tweets to find correlationbetween overall public mood and social, economic andother major events. Si et al. [6] used the co-occurrencesof stock tickers in tweets to propagate sentiment of stocksin order to predict stock values. Mittal and Goel [7] minedpublic mood from Twitter and used it along with previousdays’ stock market values to predict future stock marketmovements. Ceron et al. [8] used sentiment analysis ontweets to understand the policy preferences of Italian andFrench citizens. Tumasjan et al. [9] analyzed the role ofpolitical sentiment and mentions of political parties inTweets as indicators of election results.

In contrast to this rich landscape, very few studieshave focused on investigating users’ sentiment towards

major worldwide brands. Adeborna and Siau [10] proposea sentiment topic recognition model based on correlatedtopics models with variational expectation-maximizationalgorithm using airline data from Twitter. They validatetheir approach on a dataset of 1046 tweets concerningthree major airlines by computing their airline qualityrating. Mostafa [11] used a random sample of 3516 tweetsto evaluate consumers’ sentiment towards 16 global well-known brands.

Our work differs in the scale of the Twitter datasetsize as well as the number of industries and brands weanalyze. In addition, we look at users’ sentiment towardstopics. Finally, we derive some meaningful insights fromthis analysis and demonstrate the value of such an analysisin order to assess the impact of brands on social media.

III. Data SetA. Industries and brands dataset

For our purposes, a brand is defined as an independentlyrecognizable product line or business unit, often with itsown logo, trademark, slogan, etc. Examples of brands inour dataset include Cadbury, AAA, and 23andme. A brandmay be synonymous with a given company or business,but we make no attempt to represent company ownershipor hierarchies (e.g. Fanta is considered a separate brandfrom Coca-Cola, although both brands are owned by thecorporate entity The Coca-Cola Company). Brands in ourdata set can contain one or more Twitter profiles andbelong to one or more industries.

An industry is defined as a high-level category of brandsthat perform similar functions or provide similar products.In this analysis, we use a subset of 62 industries drawnfrom LinkedIn Industry Codes2, the Fortune 5003, theForbes Global 20004, and Hoovers5. See Figure 2 for alist of included industries.

The brand dataset was initially bootstrapped by collect-ing companies’ names and industries from the LinkedInprofiles of Klout6 users. We then attempted to match thecompany names to Twitter handles via the Bing SearchAPI. However, this approach had some limitations:

• LinkedIn included many companies that had shutdown, changed names, etc.

2http://developer.linkedin.com/docs/reference/industry-codes

3http://fortune.com/fortune500/list/4http://www.forbes.com/global2000/5http://www.hoovers.com/industry-analysis/industry-

directory.html6https://klout.com

-1 0 1General Sentiment

-1

0

1

Brand

Sen

timen

t

A

all


-1

0

1

B

automative


-1

0

1

C

airlines


-1

0

1

D

fashion


-1

0

1

E

finance

Fig. 1: Heatmap of users’ brand and general sentiments

0

0.2

0.4

0.6

0.8

1

rad

io

mu

seu

ms-

and

-in

stit

uti

on

s

min

ing

-an

d-m

etal

s

pu

bli

shin

g

edu

cati

on

inte

rnet

con

sum

er-g

oo

ds

reta

il

bea

uty

mu

sic

mar

ket

ing

-an

d-a

dv

erti

sin

g

alco

ho

lic-

bev

erag

es

bu

sin

ess-

serv

ices

fitn

ess

aero

spac

e-an

d-d

efen

se

even

ts

con

stru

ctio

n

hea

vy

-eq

uip

men

t

hea

lth

care

-an

d-m

edic

ine

hea

lth

fash

ion

mo

vie

s-an

d-f

ilm

tech

no

log

y

soft

war

e

no

np

rofi

ts

auto

-par

ts-a

nd

-su

pp

lies

ente

rtai

nm

ent

real

-est

ate

auto

mo

tiv

e

spo

rts

con

sum

er-e

lect

ron

ics

oil

-an

d-e

ner

gy

hig

her

-ed

uca

tio

n

con

glo

mer

ates

per

son

al-c

are

chem

ical

s

ho

use

ho

ld-a

pp

lian

ces

vid

eo-g

ames

go

ver

nm

ent

trav

el

civ

ic-a

nd

-so

cial

-org

aniz

atio

ns

ho

spit

alit

y

new

spap

ers-

and

-mag

azin

es

med

ia

ph

arm

aceu

tica

ls

com

pu

ter-

har

dw

are

spo

rts-

med

ia

fin

anci

al-s

erv

ices

insu

ran

ce

spec

ialt

y-s

tore

s

tele

vis

ion

uti

liti

es

rest

aura

nts

foo

d-a

nd

-bev

erag

es

po

liti

cs-a

nd

-po

licy

ban

kin

g

sup

erm

ark

ets-

and

-fo

od

-ret

ail

tran

spo

rtat

ion

con

sum

er-s

erv

ices

tele

com

mu

nic

atio

ns

mai

l-an

d-s

hip

pin

g

airl

ines

Cross Industry Content Sentiment (Deep Learning Based)

PositiveNeutral

Negative

Fig. 2: Sentiment distribution by industry, ordered by negative sentiment (best viewed in color)

0

0.2

0.4

0.6

0.8

1

poli

tics

-and-p

oli

cy

tran

sport

atio

n

radio

tele

vis

ion

mai

l-an

d-s

hip

pin

g

tele

com

munic

atio

ns

gover

nm

ent

airl

ines

new

spap

ers-

and-m

agaz

ines

med

ia

consu

mer

-ser

vic

es

rest

aura

nts

soft

war

e

sport

s-m

edia

conglo

mer

ates

min

ing-a

nd-m

etal

s

ban

kin

g

finan

cial

-ser

vic

es

real

-est

ate

chem

ical

s

busi

nes

s-se

rvic

es

mar

ket

ing-a

nd-a

dver

tisi

ng

civic

-and-s

oci

al-o

rgan

izat

ions

oil

-and-e

ner

gy

phar

mac

euti

cals

com

pute

r-har

dw

are

uti

liti

es

tech

nolo

gy

nonpro

fits

spec

ialt

y-s

tore

s

vid

eo-g

ames

super

mar

ket

s-an

d-f

ood-r

etai

l

insu

rance

fitn

ess

sport

s

hig

her

-educa

tion

reta

il

hea

lth

aero

spac

e-an

d-d

efen

se

inte

rnet

ente

rtai

nm

ent

educa

tion

consu

mer

-ele

ctro

nic

s

hea

lthca

re-a

nd-m

edic

ine

const

ruct

ion

auto

moti

ve

even

ts

hea

vy-e

quip

men

t

musi

c

food-a

nd-b

ever

ages

muse

um

s-an

d-i

nst

ituti

ons

fash

ion

movie

s-an

d-f

ilm

hosp

ital

ity

trav

el

per

sonal

-car

e

auto

-par

ts-a

nd-s

uppli

es

bea

uty

publi

shin

g

alco

holi

c-bev

erag

es

consu

mer

-goods

house

hold

-appli

ance

s

Cross Industry Content Sentiment (Deep Learning Based)

PositiveNeutral

Negative

Fig. 3: Sentiment distribution by industry, ordered by positive sentiment (best viewed in color)

• Even with the industry name appended, the BingSearch API often returned non-brand Twitter ac-counts with similar names to the target brands.

• Bing Search API provided only one Twitter handleper brand, whereas many brands have several spe-cialized handles (for customer service vs. corporatecommunications, in different languages, etc.).

Thus, in order to guarantee the quality of the data, wechose to manually curate the brand dataset. Candidatehandles were collected both from the initial bootstrap dataand by mining the Twitter Firehose stream7, looking atsignals such as reply volume and average time to reply(which can indicate a handle staffed by professional socialmedia agents), follower and mention volume, and the use ofenterprise-level publishing and reply tools. Each candidatehandle was reviewed to determine if it belonged to a brand.If yes, it was assigned a new or existing brand node and

7http://support.gnip.com/apis/firehose/overview.html

given one or more industry labels. At this time, the branddata set contains 12,898 brands and 16,137 handles.

B. Twitter datasetFor each brand, we collected interactions (including

replies to tweets, mentions, retweets, and cc) of Twitterusers with the brand for the month of April 2017, viathe Twitter Firehose stream. Each tweet was processedthrough the Lithium NLP pipeline [12] to extract infor-mation such as the tweet language, named entities etc.For each Twitter user in our dataset, we collected all theirEnglish tweets for April 2017.

The complete dataset contains more than 19M uniqueusers that interacted with the 12, 898 brands. Theseusers generated more than 330M user-brand interactions.Among these interactions, there are more than 56Munique tweets in English. Overall, these users postedabout 282M unique tweets in English. Table I shows someexamples of user brand interactions in our dataset alongwith their sentiment.

0

0.2

0.4

0.6

0.8

1ge

-avi

atio

nai

r-fra

nce-

klm

croa

tia-a

irlin

esol

ympi

c-ai

rch

ina-

sout

hern

-airl

ines

all-n

ippo

n-ai

rway

sci

tilin

kae

rolin

eas-

arge

ntin

asai

rbal

ticko

rean

-air

japa

n-ai

rline

sgo

l-lin

has-

aere

asab

-airl

ines

air-m

aurit

ius

thom

as-c

ook-

airli

nes

flyna

sha

inan

-airl

ines

sun-

coun

try-a

irlin

esae

nam

ango

-airl

ines

fastj

etai

r-new

-zea

land

oman

-air

cond

or-a

irlin

esau

stria

n-ai

rline

sga

ruda

-indo

nesia

keny

a-ai

rway

ssin

gapo

re-a

irlin

esvo

laris

skyw

est-a

irlin

esfin

nair

icel

anda

irsw

iss-in

tern

atio

nal-a

ir-lin

esro

yal-j

orda

nian

mal

aysia

-airl

ines

turk

ish-a

irlin

esqa

tar-a

irway

sha

wai

ian-

airli

nes

air-e

urop

aje

t2sp

icej

et-li

mite

dm

iddl

e-ea

st-ai

rline

svi

vaae

robu

sai

rasia

atla

s-ai

r-wor

ldw

ide

mon

arch

-airl

ines

air-a

rabi

aet

ihad

-airw

ays

virg

in-a

ustra

liabr

itish

-mid

land

-inte

rnat

iona

lre

publ

ic-a

irway

sem

irate

sbr

usse

ls-ai

rline

sai

r-tra

nsat

-a-t-

inc

cath

ay-p

acifi

ctra

nsav

iala

tam

-airl

ines

scan

dina

vian

-airl

ines

cebu

-pac

ific-

air

tap-

portu

gal

saud

i airli

nes

skyt

eam

air-c

hina

virg

in-a

tlant

ic-a

irway

sph

ilipp

ine-

airli

nes

lufth

ansa

inte

rjet

copa

-airl

ines

el-a

l-airl

ines

porte

rairl

ines

alas

ka-a

irlin

esci

tyje

tkl

mqa

ntas

-airw

ays

wes

tjet

norw

egia

n-ai

r-shu

ttle

avia

nca

air-i

ndia

briti

sh-a

irway

svi

rgin

-am

eric

aae

r-lin

gus

air-f

ranc

eib

eria

indi

go-a

irlin

esje

tblu

eex

pres

sjet

sout

h-af

rican

-airw

ays

wow

-air

ryan

air

sout

hwes

t-airl

ines

vuel

ing-

airli

nes

flybe

aero

mex

ico

star-a

llian

ceje

t-airw

ays

air-c

anad

ago

air

us-a

irway

sea

syje

tuk

rain

e-in

tern

atio

nal-a

irlin

esam

eric

an-a

irlin

esai

rber

linal

legi

ant-a

irfro

ntie

r-airl

ines

unite

d-ai

rline

sde

ltasp

irit-a

irlin

es

Cross Airline Industry Content Sentiment (Deep Learning Based)

PositiveNeutral

Negative

Fig. 4: Sentiment distribution for Airlines, ordered by negative sentiment (best viewed in color)

0

0.2

0.4

0.6

0.8

1

bri

tish

-car

-auct

ions

carf

axbyd

onst

arta

trav

elce

nte

rsdai

mle

rgro

upe-

psa

fiat

-chry

sler

-auto

mobil

esnat

ional

-car

-ren

tal

gen

eral

-moto

rsle

xus

tesl

ala

nci

afo

rdds-

auto

mobil

esduca

ti-m

oto

rgm

cau

totr

ader

volk

swag

enfc

a-fi

at-c

hry

sler

-auto

mobil

esci

troen

hold

ench

rysl

erta

ta-m

oto

rsw

inneb

ago

audi

fuji

-hea

vy-i

ndust

ries

renau

ltau

tonat

ion

alfa

-rom

eosk

oda

toyota

seat

chev

role

tnis

san

lotu

s-ca

rshyundai

jeep

mit

subis

hi-

moto

rskaw

asak

ihar

ley-d

avid

son

dac

iaen

terp

rise

-hold

ings

chev

y-e

lect

ric

auto

car

mer

cedes

-ben

zvolv

o-t

ruck

stu

roal

jom

aih-a

uto

moti

ve-

com

pan

yvau

xhal

l-m

oto

rsca

dil

lac

toyota

-moto

r-nort

h-a

mer

ica

mah

indra

-and-m

ahin

dra

acura

honda

fiat

abar

th-u

kbuic

kkia

-moto

rsin

finit

isu

bar

u-u

sapors

che

ford

-must

ang

her

o-m

oto

corp

safe

lite

edm

unds

suzu

ki-

moto

rspeu

geo

tm

ini

dodge

yam

aha

isuzu

-moto

rsvolv

ois

uzu

jaguar

landro

ver

mas

erat

im

azda-

moto

rbm

w-i

ferr

ari

opel

carm

axben

tley

-moto

rstr

ium

ph-m

oto

rcycl

esbm

w-m

oto

rrad

bm

wla

mborg

hin

iro

ber

t-fo

rres

ter

asto

n-m

arti

nm

arsh

all-

moto

r-gro

up

Cross Automotive Industry Content Sentiment (Deep Learning Based)

PositiveNeutral

Negative

Fig. 5: Sentiment distribution for Automotive, ordered by positive sentiment (best viewed in color)

C. Topics datasetEach tweet in our dataset is associated with subject

matter topics that are most relevant to the tweet text. Forinstance, the tweet “Anti-net neutrality bots are swarmingthe FCC‘s comments http://engt.co/2q4Kd7l” gets asso-ciated with topics such as Net Neutrality, Federal Commu-nications Commission and Telecommunications etc. Thesetopics belong to the Klout Topic Ontology8 (KTO) [13]- a manually curated ontology built to capture socialmedia users’ interests and expertise scores, in differenttopics, across multiple social networks. As of April 2017, itconsists of roughly 8,030 topic nodes and 13,441 edges en-coding hierarchical relationships among them. On Klout,we use these topics to model users’ interests [14] andexpertise [15] by building their topical profiles. These usertopics are also available via the GNIP PowerTrack API9.The topics are inferred by mapping the recognized

entities in a tweet to KTO topics via the Lithium NLPpipeline [12] which uses a weighted ensemble of severalsemi-supervised learning models that employ entity co-

8http://bit.ly/2fcg3wi9http://support.gnip.com/enrichments/klout.html

occurrences, word embeddings, Freebase hierarchical re-lationships and Wikipedia.

IV. Method

To analyze the sentiment of a tweet and label it asone of the 3 sentiment classes (‘Positive’, ‘Negative’ or‘Neutral’), we used a Deep Learning based approach asthis field has recently enabled strong advances in NLP [16].In particular, we use Recurrent Neural Networks (RNNs)which are a class of neural networks that are able to modelrelationships in time. Unlike traditional neural networks,RNNs use units with internal states that can persist infor-mation about previous events. This makes them suitablefor problems that require sequential information, suchas text processing tasks. We also experimented with alexicon based approach but found deep learning to be moreaccurate given empirical evidence ([17], [18]).

In our implementation, we use a conventional LongShort Term Memory (LSTM) network [17]. We first pre-process the text by tokenizing and normalizing it. We thenfeed this text to the LSTM, and use the labels to performsupervised learning on the messages. We use a vocabularyof 40k words, 128 embeddings units, 32 units, and a

100

1000

10000

100000

1x106

heav

y-eq

uipm

ent

min

ing-

and-

met

als

cons

truct

ion

utili

ties

chem

ical

sph

arm

aceu

tical

sin

sura

nce

mar

ketin

g-an

d-ad

verti

sing

real

-est

ate

hosp

italit

yau

to-p

arts

-and

-sup

plie

she

alth

care

-and

-med

icin

epe

rson

al-c

are

tech

nolo

gyho

useh

old-

appl

ianc

esoi

l-and

-ene

rgy

busi

ness

-ser

vice

sm

ail-a

nd-s

hipp

ing

finan

cial

-ser

vice

she

alth

bank

ing

cong

lom

erat

estra

vel

cons

umer

-ser

vice

sed

ucat

ion

auto

mot

ive

com

pute

r-ha

rdw

are

beau

tysu

perm

arke

ts-a

nd-f

ood-

reta

ilso

ftwar

efa

shio

nco

nsum

er-g

oods

tele

com

mun

icat

ions

publ

ishi

nghi

gher

-edu

catio

nsp

ecia

lty-s

tore

sfit

ness

food

-and

-bev

erag

esco

nsum

er-e

lect

roni

cstra

nspo

rtatio

nre

tail

mus

eum

s-an

d-in

stitu

tions

civi

c-an

d-so

cial

-org

aniz

atio

nsm

ovie

s-an

d-fil

mre

stau

rant

sno

npro

fits

even

tsal

coho

lic-b

ever

ages

vide

o-ga

mes

mus

icai

rline

sae

rosp

ace-

and-

defe

nse

gove

rnm

ent

polit

ics-

and-

polic

yen

terta

inm

ent

tele

visi

onm

edia

new

spap

ers-

and-

mag

azin

essp

orts

radi

osp

orts

-med

iain

tern

et

Fig. 6: Average User-Brand Interactions in Each Industry

Fig. 7: Word cloud of topic sentiments across all industrieswith color representing the sentiment (red is negative,green is positive) and the shade representing its intensity

dropout of 0.5. Our training dataset consists of about 14Mmessage sentiment labels that come from a brand agentmanual assignment dataset [18]. The trained model out-puts a probability distribution over the 3 sentiment classesfor a test sample. To compute a final sentiment score, weuse a simple approximation of taking the difference of thepositive and negative sentiment probabilities.

V. ResultsWe categorized the results obtained from this analysis

into three main categories:

A. Brand sentiment vs general sentimentWe first examined whether users showed different sen-

timents when interacting with brands and otherwise onTwitter. In particular, we calculated two sentiment scores(on a scale of [-1,+1]) for each user in our dataset:

• Brand sentiment: This is the average sentiment ofunique tweets posted by this user during his/her

interactions with brands. Here, we calculated thesentiment score for all brands, as well as scores forindividual industries. Average sentiment is computedby taking the average of the difference of positive andnegative sentiment.

• General sentiment: Average sentiment of all uniquetweets posted by this user (including interactions witha brand).

Figure 1A shows the two-dimensional heatmap of thesetwo scores for all the 19M users of our dataset in the62 industries, where the brighter color indicates higherdensity of users. We also break down this heatmap fordifferent industries. Figures 1B, 1C, 1D, and 1E show thehistograms for Automotive, Airlines, Fashion and Financeindustries respectively. Some important observations illus-trated in these figures are:

• Both the brand and general sentiment scores arecentered around 0.2.

• General sentiment scores are more closely centeredaround 0.2, whereas brand sentiment scores spreadsmore towards 1 (positive) and -1 (negative). Thisshows that users tend to be more positive or negativethan usual when interacting with brands.

• There is positive correlation (as computed by Pear-son’s correlation coefficient), 0.7, between these twoscores for all brands. This shows that users who gen-erally express one sentiment class (positive, negativeor neutral) in their tweets tend to express that samesentiment toward brands.

• The diagonal in each figure corresponds to userswhose brand sentiment score and general sentimentscore are almost the same. This represents a signifi-cant proportion of all users (about 3.5%) who do notpost much other than when interacting with brands.

B. Industry and brand sentimentWe then examined sentiment distributions across dif-

ferent industries and brands. To do so, we first classifiedeach tweet as positive, negative, or neutral based on thehighest sentiment probability computed by our LSTMmodel. We then calculated the percentage of positive,negative, and neutral tweets among all unique tweets foreach industry and brand. Figure 2 shows the percentsentiment distribution of individual industries, sorted by

airlines

transpo

rtatio

nmail-a

nd-shipp

ing

consum

er-services

telecommun

ications

politics-and-po

licy

chem

icals

financial-services

pharmaceuticals

utilities

real-estate

civic-and-social-organizations

superm

arkets-and

-foo

d-retail

insurance

government

oil-a

nd-energy

marketin

g-and-advertising

bank

ing

cong

lomerates

person

al-care

newspapers-and-magazines

techno

logy

television

high

er-edu

catio

nradio

internet

business-services

food

-and

-beverages

video-games

media

nonp

rofits

heavy-equipm

ent

softw

are

restaurants

alcoho

lic-beverages

compu

ter-hardware

retail

health

household-appliances

healthcare-and

-medicine

fitness

mining-and-metals

auto-parts-and

-sup

plies

hospitality

specialty

-stores

automotive

education

constru

ction

consum

er-electronics

aerospace-and-defense

fashion

publishing

beauty

travel

sports

enterta

inment

consum

er-goo

dssports-m

edia

events

mov

ies-and-film

museums-and-institu

tions

music

Fox NewsDonald Trump

CNNVenezuela

SyriaFifth Harmony

RussiaThe New York Times

PortugalUnited States of America

TelevisionYouTube

SoccerCountry Music

TwitterWWE

Sports and RecreationBasketball

MusicVideo Games

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

Fig. 8: Topic Sentiment across different industries (best viewed in color)

negative sentiment. Figure 3 shows the same data exceptthat industries are sorted by positive sentiment.

Clearly, sentiment distributions vary significantly acrossdifferent industries. Interestingly, the most negative indus-tries tend to be those providing services to customers, suchas Airlines, Mail and Shipping, and Telecommunications,whereas the most positive industries tend to be those man-ufacturing and selling consumer goods, such as Householdappliances. Additionally, the most polarized industries aresurprisingly not Politics and Sports (as one might expect).We also analyzed the sentiment within specific indus-

tries10, taking Airlines (which has the highest negativesentiment) and Automotive (which has a high positivesentiment) as use cases. Figure 4 shows the sentimentdistribution of individual brands in the airline industry,sorted by negative sentiment. Clearly, sentiment distribu-tion varies significantly across different brands. As shown,users are quite negative towards Spirit, Delta, and United,while quite positive towards Air-Mauritius, Sun-Country,and Thomas Cook.

Figure 5 show the sentiment distribution of individualbrands in the Automotive industry. Again, we observesignificant variation across different brands. The mostpositively viewed brands include luxury brands such asAston Martin, Lamborghini and BMW, while the most neg-atively viewed brands include rental car companies (suchas Enterprise and National), Hyundai and Volkswagen.

In addition, we noticed that brands with a higher vol-ume of interactions tend to have more neutral sentiment.As an illustration, Figure 6 shows the average user-brandinteractions across industries. Among the top 10 industrieswith the most interactions, 4 of them are among the top 10industries with the most neutral sentiment, and 8 of themare among either the bottom 10 industries with the mostpositive or negative sentiment. We suspect this is because

10Due to lack of space, it is not possible to show sentimenttowards brands in all the 62 industries

these industries get a lot of interactions and coverage frommedia, whose messages generally tend to be neutral.

C. Topic sentimentWe also analyzed the sentiment towards topics associ-

ated with users’ brand interaction tweets. Here we assumethat when users interact with brands belonging to anindustry, they will mention certain topics in the tweet.By assigning the tweet’s sentiment score to these topics,we can roughly estimate the average sentiment of thosetopic within those industries. Figure 7 shows the wordcloud representing the average sentiment of the 100 mostfrequent topics in users’ interactions across all industries.As shown, highly negative topics include Fox News whilehighly positive topics include Video Games and Music.Figure 8 shows the average sentiment of the 20 most

frequent topics (in users’ interactions across all industries)within each industry, sorted by the average sentiment.It depicts a weak correlation between industry and topicsentiment. But there are many exceptions to this, such astopic CNN in industry Heavy Equipment and topic Russiain industry Entertainment. Moreover, average sentimentstowards these topics vary significantly across differentindustries and are highly dependent on the context of theindustry that the topics are mentioned in. For instance,from Figure 7 we can infer that Fox News is a highlynegative topic and we might expect it to be negative acrossall industries. However, we can see in Figure 8 that thistopic actually has positive sentiment in industries such asEvents, Music and Sports.Figure 9 shows the correlation between industries rep-

resented by their topic sentiment vectors. Each industryis represented as a 100 dimensional vector of the mostfrequent topics (those shown in Figure 7) and the value ofeach dimension represents the average sentiment towardsthat topic within that industry. We centered these sen-timent values around zero in order to mitigate any biasin the data. We then computed cosine similarity between

aerosp

ace-an

d-de

fens

eairline

salco

holic-bev

erag

esau

to-parts-and

-sup

plies

automotive

bank

ing

beau

tybu

sine

ss-service

sch

emicals

civic-an

d-so

cial-organ

izations

compu

ter-ha

rdware

cong

lomerates

cons

truction

cons

umer-electronics

cons

umer-goo

dsco

nsum

er-service

sed

ucation

enterta

inmen

tev

ents

fash

ion

finan

cial-service

sfitne

ssfood

-and

-bev

erag

esgo

vernmen

the

alth

healthca

re-and

-med

icine

heav

y-eq

uipm

ent

high

er-edu

catio

nho

spita

lity

hous

ehold-ap

plianc

esinsu

ranc

einternet

mail-a

nd-shipp

ing

marke

ting-an

d-ad

vertising

med

iamining-an

d-metals

mov

ies-an

d-film

mus

eums-an

d-institu

tions

mus

icne

wsp

apers-an

d-mag

azines

nonp

rofits

oil-a

nd-ene

rgy

person

al-care

pharmac

eutic

als

politics-an

d-po

licy

publishing

radio

real-estate

restau

rants

retail

softw

are

spec

ialty

-stores

sports

sports-m

edia

supe

rmarke

ts-and

-food

-retail

tech

nology

teleco

mmun

ications

television

trans

porta

tion

trave

lutilitie

svide

o-ga

mes

aerospace-and-defenseairlines

alcoholic-beveragesauto-parts-and-supplies

automotivebankingbeauty

business-serviceschemicals

civic-and-social-organizationscomputer-hardware

conglomeratesconstruction

consumer-electronicsconsumer-goods

consumer-serviceseducation

entertainmenteventsfashion

financial-servicesfitness

food-and-beveragesgovernment

healthhealthcare-and-medicine

heavy-equipmenthigher-education

hospitalityhousehold-appliances

insuranceinternet

mail-and-shippingmarketing-and-advertising

mediamining-and-metals

movies-and-filmmuseums-and-institutions

musicnewspapers-and-magazines

nonprofitsoil-and-energypersonal-care

pharmaceuticalspolitics-and-policy

publishingradio

real-estaterestaurants

retailsoftware

specialty-storessports

sports-mediasupermarkets-and-food-retail

technologytelecommunications

televisiontransportation

travelutilities

video-games

−0.4

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

Fig. 9: Relationship between Industries represented by topic sentiment vectors (best viewed in color)

the vectors. Overall, there is no strong correlation betweentopic sentiments of different industries, demonstratingthat topic sentiments are highly industry-dependent.

VI. LimitationsWhile we have attempted to cover as much ground as

possible with respect to industries, brands and topics aswell as users’ interactions and sentiment towards them, wenote that this study has some limitations:

• We limit our dataset to Twitter alone since it is oneof the most widely used platforms for self expression.

• Our work mainly focuses on the most popular indus-try and brands.

• Due to resource constraints, we analyzed Twitter datafor only a month.

• The popular topics within industries are often basedon current context and can change with time.

• This study analyzed sentiment only at the messagelevel. Accurate sentiment analysis at the aspect levelwill lead to more insightful results.

VII. ApplicationsAs mentioned earlier, these insights about users’ senti-

ment towards consumer brands can help brands provide

better online customer service, assess the impact of socialmedia marketing campaigns or public relations incidents,identify perceived flaws in their products, and even iden-tify new sales or business opportunities. They also helpbrands gauge the consumers’ perception of them.

Another direct application of analyzing consumers’ sen-timent is to rank a brand relative to its competitors,allowing that brand to contextualize its performance onsocial media. This is easily demonstrated by the SocialImpact Ranking tool11 that Lithium has developed andwhich employs the sentiment of users’ tweets, in additionto several other metrics, to assess the impact of differentcompeting brands in various industries on social media. Asan illustration, Figures 10 and 11 show screenshots of theimpact ranking of airlines on different dates separated by3 weeks. Figure 10 coincides with an unfortunate incidentinvolving a passenger on United Airlines12As evident,United has the highest mentions as well as strong negativesentiment towards it on Twitter. However, the number ofmentions have decreased 3 weeks later (Figure 11).

11https://klout.com/impact-ranking/12http://bit.ly/2wDUR6u

Fig. 10: Airlines ranking retrieved on April 24, 2017 Fig. 11: Airlines ranking retrieved on May 15, 2017

Brands are very interested in understanding customers’attitudes towards them. There have been many demandson quantitative measures of overall attitude of users to-wards a brand as well as their underlying business com-ponents, such as product, website, support and customerservice. Sentiment analysis of brands, topics and theirrelationship plays a key role in building such measures.

VIII. Conclusion and Future WorkIn this paper, we analyzed the opinion of 19M Twitter

users towards 62 popular industries, encompassing 12,898enterprise and consumer brands, as well as associatedsubject matter topics via sentiment analysis of 330Mtweets over a period spanning a month. To the best ofour knowledge, no other work has attempted to analyzesentiment of users towards industries, consumer brandsand topics at such a large scale. We found that userstend to be most positive towards manufacturing and mostnegative towards service industries. In addition, they tendto be more positive or negative when interacting withbrands than generally on Twitter. We also found that sen-timent towards brands within an industry varies greatly.In addition, we discovered that there is no strong cor-relation between topic sentiments of different industries,demonstrating that topic sentiments are highly dependenton the context of the industry that they are mentionedin. We also demonstrated the value of such an analysis inorder to assess the impact of brands on social media usingseveral metrics.

This preliminary study acts as a first step towardsunderstanding the users’ perception of industries, brandsand associated topics. We hope that our analysis andthe insights that we derive from it will prove valuablefor both researchers and companies in understanding userbehavior and encourage more research in this field. Infuture, we will work on employing these datasets resultsfor sentiment forecasting. We also plan to conduct surveysamong Twitter users in order to validate the insights wehave derived from our analysis.

References[1] A. Okeleke and S. Bali, “Integrating social media into

CRM for next generation customer experience,” Ovum,Tech. Rep., 05 2014.

[2] B. Pang, L. Lee et al., “Opinion mining and sentimentanalysis,” Foundations and Trends® in Information Re-trieval, vol. 2, no. 1–2, pp. 1–135, 2008.

[3] R. Balasubramanyan, B. R. Routledge, and N. A. Smith,“From tweets to polls: Linking text sentiment to publicopinion time series.” ICWSM, pp. 122 – 129, 2010.

[4] Y. Hu, F. Wang, and S. Kambhampati, “Listening tothe crowd: Automated analysis of events via aggregatedtwitter sentiment.” in IJCAI, 2013.

[5] J. Bollen, H. Mao, and A. Pepe, “Modeling public moodand emotion: Twitter sentiment and socio-economic phe-nomena.” ICWSM, vol. 11, pp. 450–453, 2011.

[6] J. Si, A. Mukherjee, B. Liu, S. J. Pan, Q. Li, and H. Li,“Exploiting social relations and sentiment for stock pre-diction.” in EMNLP, vol. 14, 2014, pp. 1139–1145.

[7] A. Mittal and A. Goel, “Stock prediction usingtwitter sentiment analysis,” Standford University,CS229 (http://cs229.stanford.edu/proj2011/GoelMittal-StockMarketPredictionUsingTwitterSentimentAnalysis.pdf).

[8] A. Ceron, L. Curini, S. M. Iacus, and G. Porro, “Everytweet counts? how sentiment analysis of social media canimprove our knowledge of citizens’ political preferenceswith an application to italy and france,” New Media &Society, vol. 16, no. 2, pp. 340–358, 2014.

[9] A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M.Welpe, “Predicting elections with twitter: What 140 char-acters reveal about political sentiment.” ICWSM, vol. 10,no. 1, pp. 178–185, 2010.

[10] E. Adeborna and K. Siau, “An approach to sentimentanalysis-the case of airline quality rating.” in PACIS, 2014,p. 363.

[11] M. M. Mostafa, “More than words: Social networks’ textmining for consumer brand sentiments,” Expert Systemswith Applications, vol. 40, no. 10, pp. 4241–4251, 2013.

[12] P. Bhargava, N. Spasojevic, and G. Hu, “Lithium nlp: Asystem for rich information extraction from noisy user gen-erated text on social media,” in Proc. of the 3rd Workshopon Noisy User-generated Text, 2017, pp. 131–139.

[13] S. Ellinger, P. Bhattacharyya, P. Bhargava, and N. Spa-sojevic, “Klout topics for modeling interests and expertiseof users across social networks,” in under review.

[14] N. Spasojevic, J. Yan, A. Rao, and P. Bhattacharyya,“Lasta: Large scale topic assignment on multiple socialnetworks,” in Proc. of ACM KDD, 2014, pp. 1809 –1818.

[15] N. Spasojevic, P. Bhattacharyya, and A. Rao, “Mining halfa billion topical experts across multiple social networks,”Social Network Analysis and Mining, vol. 6, no. 1, pp. 1–14, 2016.

[16] T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recenttrends in deep learning based natural language process-ing,” in arXiv preprint arXiv:1708.02709, 2017.

[17] A. Rao and N. Spasojevic, “Actionable and political textclassification using word embeddings and lstm,” arXivpreprint arXiv:1607.02501, 2016.

[18] N. Spasojevic and A. Rao, “Identifying actionable mes-sages on social media,” in IEEE International Conferenceon Big Data, ser. IEEE BigData ’15, 2015.

analyzing users’ sentiment towards popular consumer ... · pdf fileanalyzing...

Documents