institutionen för datavetenskap - diva-portal.org749864/fulltext01.pdf · our results with the...

35
Institutionen för datavetenskap Department of Computer and Information Science Examensarbete Twitter as the Second Channel av Matteus Hemström och Anton Niklasson LIU-IDA/LITH-EX-G--14/063--SE 2014-06-04 Linköpings universitet SE-581 83 Linköping, Sweden Linköpings universitet 581 83 Linköping

Upload: trinhcong

Post on 28-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Institutionen för datavetenskapDepartment of Computer and Information Science

Examensarbete

Twitter as the Second Channelav

Matteus Hemström och Anton Niklasson

LIU-IDA/LITH-EX-G--14/063--SE

2014-06-04

Linköpings universitetSE-581 83 Linköping, Sweden

Linköpings universitet581 83 Linköping

Linköpings universitetInstitutionen för datavetenskap

Examensarbete

Twitter as the Second Channelav

Matteus Hemström och Anton Niklasson

LIU-IDA/LITH-EX-G--14/063--SE

2014-06-04

Handledare: Niklas Carlsson

Examinator: Nahid Shahmehri

Students in the 5 year Information Technology program complete a semester-long software development project during their sixth semester (third year).The project is completed in mid-sized groups, and the students implement amobile application intended to be used in a multi-actor setting, currently asearch and rescue scenario. In parallel they study several topics relevant tothe technical and ethical considerations in the project. The project culmi-nates by demonstrating a working product and a written report documentingthe results of the practical development process including requirements elic-itation. During the final stage of the semester, students create small groupsand specialise in one topic, resulting in a bachelor thesis. The current re-port represents the results obtained during this specialization work. Hence,the thesis should be viewed as part of a larger body of work required topass the semester, including the conditions and requirements for a bachelorthesis.

Abstract

People share a big part of their lives and opinions on platforms such asFacebook and Twitter. The companies behind these sites do their absolutebest to collect as much data as possible. This data could be used to extractopinions in many different ways.

Every company, organization or public person is probably curious on whatis being said about them right now. There are also areas where opinions arerelated to the outcome of an event. Examples of such events are presidentialelections or the Eurovision Song Contest. In these events, peoples’ votes willdirectly reflect the outcome of the elections or contests.

We have developed a simplistic prototype that is able to predict the resultof the Eurovision Song Contest using sentiment analysis on tweets. Theprototype collects tweets about the event, performs sentiment analysis, anduses different filters to predict the ranks of the contestants. We evalutedour results with the actual voting results of the event and found a Pearsoncorrelation of approximately 0.65.

With more time and resources we believe that it is possible to create a highlyaccurate prediction model. It could be used in lots of different contexts.Politicians and their parties could use it to evaluate their campaigns. Thepress could use it to create more interesting news reports. Companies wouldbe able to investigate their brand appreciation. A system like this could beused in many different fields.

Contents

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Theory and Related Work 32.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Expectations 5

4 Methodology 64.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64.3 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74.4 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.5 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5 Results 125.1 Dataset Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125.2 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.3 Prediction Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6 Discussion 176.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6.1.1 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.3 Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7 Conclusion 217.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

A Entity Mentions 24

B Language Distribution 25

C Correlation Plots 26

D Results From Visualization 27

Chapter 1

Introduction

1.1 Motivation

The Eurovision Song Contest is a very popular event in most countriesacross Europe. It engages hundreds of millions of people over the course ofa few weeks each spring. The whole show is broadcasted live by multiple TV-channels and people gather at home to support their favourite act. Althoughthe main interaction by people is set to watching TV, there is a lot of activityin social media as well.

The fact that the result of this contest is based on people’s votes and thatthey continuously share their opinions for anyone to read creates great aopportunity for analysis.

Our goal is to collect tweets via the Twitter API, analyse them in termsof sentiment and create a prediction of the final results. We would like tofind out if our result can predict the outcome of the event with sufficientaccuracy.

1.2 Problem Statement

Using simple entity extraction and sentiment analysis, this thesis exploreshow information in tweets can be used to predict outcomes of competi-tions, such as the Eurovision Song Contest. The core question of this paperis:

• Is it possible to predict the result of the Eurovision Song Contest byrunning sentiment analysis on tweets related to the topic?

1

1.3. CONTRIBUTIONS CHAPTER 1. INTRODUCTION

1.3 Contributions

We have created a system that we call Eagle. It includes three mod-ules:

• Data CollectionThis module is responsible for talking to the Twitter REST API. Itdownloads tweets and users, and saves the data in a MySQL database.

• Data AnalysisThis module is responsible for extracting entities and analysing sen-timent using our heuristics. The entity extraction connects the tweetwith one or more of our pre-defined entities. We do this by simplecomparisons between the tweets body text and a list of identifiers thatwe have manually decided upon.

• VisualizationThis module is responsible for presenting the analysed tweets. It pro-vides a simple web interface for building database queries. The web-site will then present a bar graph showing the result. It comes with 3heuristics, 2 filters and an option for languages.

We have also collected 737.793 tweets tagged with #eurovision. Since Twit-ter does not allow API access to tweets older than 8-10 days this data wouldprobably be interesting in other projects.

2

Chapter 2

Theory and RelatedWork

No more than a couple of years ago we had nowhere near as much dataavailable to us as we have today. Since then, a lot of studies have beendone in this field. There is also a growing interest of language analysis incommercial markets.

2.1 Sentiment Analysis

The concept of sentiment analysis is to determine opinions in text. Manytimes the opinions are directed towards an entity. The entities are oftenpolitical parties or competitors in some form. We are focusing on the Eu-rovision Song Contest, so let’s use that as an example. We think of eachindividual act as an entity. This means that each tweet can have an opinionon each entity.

There are many tools to perform sentiment analysis. A popular technique tocharacterize sentiment in short texts is Linguistic Inquiry and Word Count(LIWC). LIWC has been used to determine the sentimental value in tweets[4]. LIWC is a commercial product, The Psychological Meaning of Words:LIWC and Computerized Text Analysis Methods [9] is a paper describinghow it was created.

Another algorithm that has been used for sentiment analysis in tweets is Sen-tiStrength [10]. SentiStrenght is designed to extract sentimental values fromMySpace comments [10]. Thelwall et al. [11] claims that “[..] the accuratedetection of sentiment is domain-dependant” and that the SentiStrength al-gorithm is suitable for Twitter comments since they are similar to MySpace

3

2.2. COLLABORATIVE FILTERINGCHAPTER 2. THEORY AND RELATED WORK

comments. They also state that informal language and abbreviations maybe common which is somewhat contradictory towards the work of Hu et al.[3]. They claim that Twitter is an “evolving medium whose language is aprojection of the language of more formal media like news and blogs into aspace restricted by size”.

2.2 Collaborative Filtering

As explained by Kim et al. [4] the entities and users could be organized ina matrix. Each column represents an entity, and each row is a user. Theneach cell is given a rating of a user’s sentiment towards an entity. Users thatshare sentiment towards a few entities are likely to have similar opinions onother entities. It is therefore possible to extrapolate opinions by identifyingsimilar users.

This is called collaborative filtering. It is most commonly used for recom-mending content to users. Even though our work is not centred aroundrecommendations this is an interesting technique, as recommendations arebasically predictions of opinions.

While we did not implement any collaborative filtering in our analysis, weexpect that it could be added as a compliment to the summarization modelthat we developed.

4

Chapter 3

Expectations

Our expectations for this project is that the system will be able to accuratelypredict three entities out of the actual top five without any internal orderor ranking. Initially we had an idea of predicting the complete results. Afew weeks in we felt that it was a bit too ambitious. Predicting 60 % of thetop contestants is still useful while also achievable from our point of view.Predicting 60 % is relevant in the context of the Eurovision Song Contest.That number would not mean anything if we were to predict a presidentialelection or something with a lower number of entities. Having many entitiesmakes it harder in some ways. A big hurdle is trying to decide which entitysome tweet is mentioning.

Sentiment analysis is difficult, even for humans. An interesting fact is thathumans are about 80 % accurate when it comes to deciding sentiment [8].This means that a computer which is correct 10 out of 10 times would stillnot be considered correct by a human in every case, making it difficult toget a god end result.

Since we are working with a tight deadline we present a preliminary pre-diction analysis based on a simple prediction design. With more time onour hands we would probably try to make our prediction more accurate interms of internal rankings in the top 5. We expect it to be fairly easy torecognize the most popular entities. The real challenge is to decide exactlyhow popular different entities are.

Developing a complete system like this is a time-consuming task. Our goalis to create a prototype and present some of the potential in this area ofresearch.

5

Chapter 4

Methodology

4.1 Overview

We have tried to define sub-systems, or modules, in our larger complexsystem. They operate completely independent to each other. Our goal herewas to make each module small and lucid. We ended up with three differentsub-systems: data collection, sentiment analysis and visualization. This isshown in Figure 4.1.

Figure 4.1: Low-tech overview

4.2 Data Collection

The task for this module is to talk to the Twitter API. It collects relevanttweets from a given hashtag and saves them. Our goal here was to make themodule smart enough to expand its topic and figure out relevant hashtags onits own. Unfortunately we did not have enough time to develop it further.We are currently feeding it with a hashtag manually.

The data collection module is written in the Go language. This language

6

4.3. SENTIMENT ANALYSIS CHAPTER 4. METHODOLOGY

is performing really well for this sort of task. It allows for powerful asyn-chronous code.

The module keeps track of what tweets it has already downloaded by storingtwo state variables called max and since. The usage of these variables arewell documented in the Twitter API documentation [13]. By tracking thesevariables and using them for the Twitter API search pagination, the modulecan make sure that we will not miss any tweet and that we don’t downloadany tweet we already have.

4.3 Sentiment Analysis

The module for analysing the raw data is the most sophisticated module.Its task is to look at each individual tweet and analyse them in terms ofsentimental value. This means that the module needs to understand whatis considered positive and what is considered negative. The analysis moduleis also written in Go.

To accomplish the sentiment analysis, the module takes tweets that it hasnot yet analysed from a database shared with the data collection module.The analysis module separates its work into two different tasks, entity ex-traction and sentiment analysis. These tasks are modular and can thereforebe executed asynchronously and in parallel. This module is running com-pletely independent of the data collection module.

The entity extraction is done with a very simple approach. Each entity isassociated with multiple identifiers. For example, the entity Sanna Nielsen(Sweden’s contestant in the contest) have both Sanna and Nielsen as identi-fiers. We also add the countries they represent and the name of their songs.Whenever an identifier is found in a tweet, the analysis module will createan association between the entity and the tweet.

The sentiment analysis is more complex. We created a system capable ofrunning multiple heuristics of sentiment analysis. This makes it possible tocompare different algorithms and approaches of analysis. To optimize forperformance we made sure that the module kept track of what heuristic hadbeen run on what tweet. This feature gives us the ability to analyse tweetsin real time with just a few seconds of delay. We did this on the night ofthe final.

A great challenge with this module is that not all tweets are written in thesame language. This means we either to translate every tweet, or to havecustomized algorithms for each individual language. Initially we wanted totranslate every tweet to English and we did some research on the subject.We came to the conclusion that translation services are expensive. Google

7

4.4. VISUALIZATION CHAPTER 4. METHODOLOGY

Translate’s API would cost us hundreds of dollars. That is not somethingwe are ready to invest at this point.

One of our heuristics uses the SentiStrength algorithm. It is able to pro-cess 16,000 “social web texts” per second, according to its website [7]. Ourheuristic takes the text of a tweet, passes it to the standard input of aSentiStrength process, and reads the response from standard output. Sen-tiStrength is primarily focused on analysing English text. But it is alsopossible to extend its support for other languages by feeding it with datafiles for each language.

SentiStrength is widely used and there are multiple languages supported.There are a couple of language data packages available on its website. Theseare files generated with machine learning. The following languages are theones provided which we incorporated in our heuristic: Arabic, Dutch, En-glish, French, German, Greek, Italian, Persian, Polish, Portuguese, Russian,Spanish, Swedish, Turkish and Welsh. It is worth mentioning that the qual-ity of these data packages vary quite a bit. Most of them are untested andnot fully reliable. We chose to make use of them anyway since runningsome analysis on a per language basis is still better than running an Englishanalysis on foreign tweets.

Another heuristic we created uses a very simple algorithm of our own that wenamed SimpleWords. It searches the tweet text for some predefined wordsthat are classified as positive or negative. The score is then determinedby how many positive, versus negative words, that are found in the text.We created this heuristic as a simple proof-of-concept and to allow us tocompare a simple and naive approach to more complex algorithms.

4.4 Visualization

Data collection and sentiment analysis is almost useless if there is no way ofpresenting the result in an understandable way. We wanted to present ouranalysed data in a simple and accessible way. To accomplish this we choseto create a website with the data visualized by diagrams.

To create the website we used a PHP web framework called Laravel. Thisframework provides lots of features: simple routing, database migrations,database seeding, an ORM and much more. We used Laravel to create andexecute database queries. Since many database queries are heavy and slow,the results of these queries are cached.

To visualize the result of the queries, we use the JavaScript library d3.js.The library is provided with the results in JSON format and is then used toproduce bar graphs.

8

4.5. FILTERS CHAPTER 4. METHODOLOGY

Figure 4.2: Frontpage of the website

4.5 Filters

We decided to add filters to the model. Our intention with these filters wereto create a more realistic approach to summarizing the scores. The filterswe created are not that advanced. The intention with filters is merely tosuggest an improvement to the model. It is a proof-of-concept design.

We realized that the scale is in fact not linear in the sense that 80 points isnot worth twice as much as 40 points. A tweet with 80 points is probablynot worth the same as two tweets with 40 points when it comes down to theactual voting process.

We also realized that we wanted to be able to modify the impact of negativetweets. As we show in Results most tweets are positive meaning that fewerpeople decide to share their negative opinions. We wanted to create a betterrepresentation for a sort of general opinion, and that is why we decided toamplify negative values. It is a compensation to the fact that fewer peopletweet negative opinions.

Every heuristic is used as a function hn(e, t) where e is an entity and t isa tweet. The set of entities is E , the set of tweets is T and set the set oftweets associated with an entity e ∈ E is Te. Note that Te ⊆ T .

Without any filter applied the equation is simple.

∑t∈Te

hn(e, t), ∀ e ∈ E . (4.1)

The function hn(e, t) is executed on every tweet attached to an entity tocalculate that entity’s total score. Each filter is defined as a function fn(x).

9

4.5. FILTERS CHAPTER 4. METHODOLOGY

−100 −80 −60 −40 −20 0 20 40 60 80 1000.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Score

Filt

er s

calin

g fa

ctor

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

−100 −80 −60 −40 −20 0 20 40 60 80 100

Figure 4.3: Plot of the low-passfilter

−100 −80 −60 −40 −20 0 20 40 60 80 1001

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

Score

Filt

er s

calin

g fa

ctor

Figure 4.4: Plot of the negativeamplifier filter

Furthermore, each filter takes one input parameter and returns a scalingfactor for that given value.

Since filters are intended for scaling we multiply the returned value fromeach filter with the score returned from hn(e, t). Equation 4.1 defines thesummarization with out filter. With a single filter f1(x) the equation be-comes:

∑t∈Te

hn(e, t) · f1(hn(e, t)), ∀ e ∈ E . (4.2)

Finally, the equation is slightly different with two filters:

∑t∈Te

hn(e, t) · f1(hn(e, t)) · f2(hn(e, t)), ∀ e ∈ E . (4.3)

This model is flexible and scalable since more filters could easily be addedto tweak the equation even further.

We have defined two filters. The first is a low-pass filter as defined inequation 4.4, and graphically illustrated in Figure 4.3.

f(x) =1

(0.0001 · x2) + 1(4.4)

This filter is designed specifically for our domain. We know −100 < x < 100and chose 0.0001 based on that.

We also defined a negative amplifier function

f(x) = (2− |x|x

)0.92, (4.5)

10

4.5. FILTERS CHAPTER 4. METHODOLOGY

also illustrated in Figure 4.4.

This filter is also specifically designed for our needs. We chose the parameter0.92 because it seemed like a good fit. Further investigations could probablyfine tune that even more.

The reasoning behind these filters is further discussed in Chapter 6.

11

Chapter 5

Results

5.1 Dataset Characteristics

We ended up with 737.793 tweets, following only the hashtag #eurovision.About 65% of those tweets are written in English. Appendix A shows howmany times each entity was mentioned in a tweet. The language distributionof the tweets can be seen in Figure 5.1 and 5.2.

The majority of all tweets are represented by only a handful of languages.This is a fairly common phenomena and indicates that the distribution isheavily skewed as with Paretos law and power-law distributions [6]. Powerfunctions are described as f(x) = αx−η. A power function turns into astraight line with slope −η and y-intercept at log(α) when taking the log-arithm on both axises. If the data would follow a power law, or a Zipfdistribution, the data should show straight-line behaviour when plotted ona log-log scale.

We can see that 10 % of the detected languages make for about 80 % ofall tweets. This follows Paretos principle, but the tail of the languages isbounded and does not appear to follow a Zipf distribution.

Figure 5.2 displays the language distribution with a logarithmic y-axis buta linear x-axis. It somewhat ressembles a straight line, meaning it is nota Zipf distribution, the long tail is simply too short. We believe that alarger dataset and better language classification would get us closer to aZipf distribution.

An interesting aspect to the language distribution is that we realized thatTwitter is running some sort of language recognition on every tweet. Wehave no insights into how they do it. We believe their algorithm is fairlyaccurate, but it is still something we cannot control.

12

5.2. SENTIMENT ANALYSIS CHAPTER 5. RESULTS

One single tweet is recognized as being written in the Kannada languagewhich is spoken in some parts of India. The reason for this is that the usertweeted only a smiley concisting of characters from their alphabet.

en

es

it

ru

frde

others

Figure 5.1: Language distribu-tion of tweets

0

2

4

6

8

10

12

14

log

of n

umbe

r of

twee

tsLanguage code

en es it ru fr de nl tr sv pl el bg da in fi sl pt sk hu uk et ja lt is lv no ht tl vi ko hy fa zh ar iw th ur ka kn

Figure 5.2: Language distribu-tion of tweets

5.2 Sentiment Analysis

As stated in Methodology we used three different heuristics to calculate asentimental value of each tweet. These algorithms are nowhere near perfectand should be considered proof of concept. As mentioned previously we alsoapplied filters in combination with each heuristic. Appendix D shows thetotal score per entity for every combination of heuristic and filter.

Not every tweet is considered either positive or negative. The majorityof tweets we ran through our heuristics were considered neutral. Theirsentimental value is equal to zero. Figure 5.3 shows the distribution ofsentiment calculated by the SentiStrength algorithm.

We believe this is an area that is not performing particularly well. Weknow that each of these tweets is about some entity and that most of themprobably states a positive or negative opinion. The problem is that most ofthem are too vague or hard to interpret and end up being read as neutral.With a more complex and tailor-made algorithm we are certain that a higherrate of opinions could be extracted.

13

5.3. PREDICTION RESULTS CHAPTER 5. RESULTS

−4 −3 −2 −1 0 1 2 3 40

0.5

1

1.5

2

2.5

3

3.5x 105

Sentiment Value

Tw

eet c

ount

Figure 5.3: Distribution of sentiment score

This is not a surprising result. Our algorithms are defensive and most wordsdoes not influence the tweets score. The heuristic SimpleWords is a greatexample. It contains only 10 positive and 10 negative words from the Englishlanguage.

5.3 Prediction Results

We knew from the start that our result would not come close to the resultpresented on TV. That final result concists of votes from the 30 countriescompeting. Our goal was to find similarities with the votes cast by countrieswith an English speaking population. England, Ireland and a few more arethe countries with English as their native language but there is still a fewmore countries where communicating in English is not uncommon.

Our results are not compared on a per-country basis due to time con-straints. We believe that would make for a better correlation in any futurework.

The result when running the SentiStrength algorithm without filters on alltweets can be seen in Figure 5.4. It places the top three as: Donatan & Cleo(Poland), Conchita Wurst (Austria) and Ruth Lorenzo (Spain). Figure 5.4illustrates each entities total score.

14

5.3. PREDICTION RESULTS CHAPTER 5. RESULTS

Figure 5.4: SentiStrength on all tweets

Juries cast about 50% of the votes of the Eurovision Song Contest. Therest of the votes are casted by the people watching. Therefore we chose tomainly compare our results against the telephone ranking only. That rankwas extracted and calculated from data provided by www.eurovision.tv. Toclarify, the comparison is made with the telephone outcome from all ofEurope. We discussed whether or not to compare it to the United Kingdomexclusively, but we cannot know for sure if an english tweet is written by aUK resident or not.

To be able to compare our result with reality we calculated the Pearsoncorrelation. It is a measure of similarity and ranges from -1 to 1 inclusive.-1 means total negative correlation and 1 is total correlation.

We created scatter plots for all combinations of heuristics and filters. Thoseplots can be found in Appendix C. To our surprise, the highest correla-tion was done by our own heuristic SimpleWords with no filters (see Fig-ure 5.5).

15

5.3. PREDICTION RESULTS CHAPTER 5. RESULTS

0 5 10 15 20 25 300

5

10

15

20

25

30

Participants

Ran

k

SimpleWords (no filters)Real telephone outcome

Figure 5.5: Telephone rank correlation: SimpleWords without filters

The x-axis is entity index, the y-axis represents rank. The red line is thewhat we were aiming for, the actual result. Note that the Pearson cor-relation accounts for all entities when calculating the correlation. That isnot exactly what we want, but it is still a good measurement of the overallperformance.

Table 5.1 displays the Pearson correlation values we found for each heuristcand filter combination. The Mentions heuristic is not applicable to any fil-ters since it only accounts for mentions, not actual sentimental value.

Our results vary quite a bit between different combinations of heuristic andfilters. The average correlation with the telephone votes only is 0.5079 whilethe average for telephone votes combined with the jury is 0.4468. This is inline with our initial expectations since we cannot really predict the votes ofthe jury. We can also see a patterns in the way our filters impact the result.Both our filters lowers the correlation meaning they need to be evaluatedfurther.

Heuristic and FilterJury + Telephone

CorrelationTelephoneCorrelation

SentiStrength (no filters) 0.4489 0.5521SentiStrength (low-pass) 0.4386 0.5515SentiStrength (amp. negatives) 0.2205 0.3415SimpleWords (no filters) 0.6561 0.6533SimpleWords (low-pass) 0.4386 0.5515SimpleWords (amp. negatives) 0.4892 0.3415Mentions 0.4359 0.5638

Table 5.1: Pearson correlation

16

Chapter 6

Discussion

6.1 Methodology

We knew beforehand that not every tweet would be written in English. Butit made things harder than we expected them to be. We thought that asimple solution would be to translate all tweets to English before process-ing them. However, it turned out we had far to much text too translateand it would be too expensive in terms of money. Google charges $20 permillion characters for both translation and language detection. We did feedSentiStrength with some language data, but not nearly enough for it to per-form well. Also, we did not spend any time configuring SentiStrength for ourspecific needs. We feel that SentiStrength has the potential to outperformSimpleWords with a bit more effort.

Every component in Eagle affects the result, which also means that everylittle tweak we do affects the result. A diverse and multi-language approachis needed to make sure we collect tweets from the whole spectrum. Forexample we are only looking at one specific hashtag when fetching tweets.There are room for improvements. A more advanced collection methodologyshould definitely strive to be highly customized on a per country basis andcollect tweets from related hashtags.

Entity extraction is the process of connecting a tweet with an entity. Doingthis is hard and we tried to refine the process. It is still nowhere nearperfect. With a better entity recognition we would get a better result.Another thing about this process is the fact that a tweet can have differentsentiment against different entities. We simplified this process and gave eachmentioned entity in a tweet the same score. This is not ideal either.

We realized quite early on that simply adding tweet scores together wouldnot give us a perfect result. There are some differences between our model

17

6.1. METHODOLOGY CHAPTER 6. DISCUSSION

and reality that makes everything more complex.

• A tweet can have a negative sentiment but you can not vote “negative”.

• Our algorithm is far from perfect at extracting entities from tweets.

• Sentiment analysis is hard. We can not know for sure that we interpretand rate the sentiment correct every time.

• “The vote of the people” is only 50 % of the total score in the Euro-vision finals. There is an elected jury from each country deciding onthe second half of the score.

• In the Eurovision Song Contest each country’s votes is weighted equally.In our model a country with more tweeters gets higher influence.

• Related to the previous point, our model allows people to “vote” ontheir own country. This is not allowed in the Eurovision Song Contest.

Initially we took each tweet about each entity and added all the pointstogether. This gives a total score that point in the right direction thoughit is far from correct. We believe that a more accurate model consists of amore advanced summarization.

6.1.1 Filters

Each person is allowed to vote how many times as they feel like. This meansthat a positive tweet could correspond to more than one vote. This conceptadds a lot of complexity to the way tweets could be interpreted. Withoutany kind of weighted sum model the scale is linear. It means that a tweetwith 20 points is worth twice as much as a tweet with 10 points. In termsof voting this is probably not the case. A person that writes a tweet valued20 points will probably not vote twice as many times as some other personwriting a tweet worth 10 points. We needed some way of translating pointsto votes in a smart way.

We decided that we wanted to run every tweet through some kind of filterto create a more “normalized” summarization. We decided on a low-passfilter. Our intention with this was to bring lower points “closer” to higherpoints. Basically preventing 100 points to be 5 times more worth than 20points.

We also created a filter that would amplify negative points. We did thismostly as an experiment. We believe that Twitter could represent “human-ity”, but for that to work we need to account for negative opinions morethan positive. Expressing a negative opinion often takes a bit more courageand effort, and that should therefore be considered more worth. We cannot really tell if this filter adds something or not. We thought about it anddecided to include it in the system.

18

6.2. RESULTS CHAPTER 6. DISCUSSION

Furthermore we have discussed some other approaches to simulate the votingprocess. Everything we have done is built on what each tweet says aboutsome entity. One approach to this whole thing is to stop looking at thetweets, and instead care about who wrote the tweet. There is a personbehind each tweet, and that person is actually the one that may cast a vote.By looking at a tweet we believe we know what that person is thinking.But in fact that tweet only shows a fraction of their thoughts. An approachto deal with this would be to look at the average sentiment point in everytweet from that person. This could give us a more accurate score andactually reflect what that person votes for, and how much they vote. Byonly looking at tweets we fool ourselves a bit. This approach would probablybe our next step if we had some more time.

6.2 Results

We are fairly satisfied with our results. In general everything went ac-cording to our initial plan and we were able to generate some interestingresults.

We had no way of test our model on an event with known outcome. TheTwitter API provides tweets no older than 8 - 10 days, meaning we were notable to collect tweets from previous years. This meant we had to build ourprediction model without really knowing what would work. This had a hugeimpact on our results and we are sure we would have been able to createa more accurate system with more feedback. With that being said we didget results that correlated with the real outcome. We speculated quite a bitto try different methods, and we hope that we can shed some light on howa future prediction model might be constructed. For someone else wantingto create a similar product, our mistakes could save them time. This paperis of course focusing on the patterns that we found and the actual resultsthat we got. But it is worth mentioning that mistakes are also valuable.Nothing is more time consuming than making the mistakes over and overagain.

Correlation is an interesting aspect of our result. But it could also be mis-leading. When we sat out to predict the result of the Eurovision SongContest our main goal was not to find a high correlation. Our goal wasto find the winner, or a highly accurate top 5 at least. Correlation givesan understanding on how our whole model performed for all the entities.Although that is interesting, the unique selling point of a system like thiswould be to accurately predict the top performers. That is what predictionsare all about, and what people are interested in. One could even argue thatcorrelation is somewhat irrelevant in this context since knowing who will fin-ish 24th is not really that interesting. Our result was actually pretty goodwhen it came to the top 5. That does not show in the correlation.

19

6.3. ETHICS CHAPTER 6. DISCUSSION

To make for a better result, and a better interpretation of the result, somerules should be setup. These are rules that limits the impact of entitieswhere the data is inadequate.

• An entity with too low of a “buzz” around them should not be con-sidered in the interpretation.

• Entities having their fellow citizens writing too much about themshould probably not be considered.

• If an entity has too much of a negative hype around them, it is probablynot correct to assume a summarization to calculate their total score.

These rules, or guidelines rather, are a part of our contribution to any futurework in this field.

6.3 Ethics

The way the Internet works today we are able to find detailed informationabout literally anything. For that to be viable anyone involved have theresponsibility to continuously evaluate how he or she handles information.The information we deal with in this paper is not “private”, we are notdoing anything illegal or unethical to obtain it. We are using informationcreated by humans and put on the Internet by themselves. Our concernsevolve around personal integrity. An unwritten rule on the Internet is thatpersonal integrity must be respected. Not everyone does that, but no onewants their own integrity compromised.

We are reading what people are sharing on the Internet. They do not knowit, and we have not done anything to ask for their permission. The data ispublicly available via the Twitter API, but it is still worth considering howour work could affect someone’s personal integrity.

We believe that we are doing our best to respect every Twitter user’s per-sonal integrity. We talk about the dataset, and not an individual tweet oruser. We do our best to ignore the people behind the tweets, because it isnot relevant to our work.

The way our system is used today there is no risk in it being harmful toanyone’s personal integrity. It all depends on how we use it. The systemcould potentially be used in more harmful ways. Say person A wants tofind out what person B thinks about person C. Person A uses our system tocollect opinions from B in a harmful way. This is hurtful for C’s integrity.But also against B’s integrity.

This system is meant to be used for gathering opinions from a larger crowd.As soon as the crowd becomes just a handful of people the extraction ofopinions could compromise someones personal integrity.

20

Chapter 7

Conclusion

Our initial goal was to investigate if Twitter users tweeting on a very specifictopic could represent a much larger crowd. If we were able to determine whatpeople tweeting on our topic were thinking, we thought that we would beable to predict opinions of the larger crowd.

The topic we were focusing on was the Eurovision Song Contest. It is a largeevent and involves a lot of people. The result is based on people’s votes,which means a prediction would be highly relevant.

We built heuristics to rank the acts and find the winner. Our goal was topredict a top 5 close to the actual results. We had the real winner on 2ndplace and a couple of others close to their actual spots. But we also foundhuge differences in our result compared to the real.

We also studied the correlation of the results. It shows some interesting data,but it is important to reflect on the way correlation works. Every positionin the field gives just as much weight to the equation as everyone else. Ourinitial plan was to focus on the top 5. But the correlation calculates a sortof average for the whole field.

The most naive heuristic performed very well in terms of Pearson corre-lation. All of our heuristics gave us an average of about 0.5 in Pearsoncorrelation.

We have come to the realization that a prediction using social media inthis way is complex. Our model is very simplified and is not as sophisti-cated as reality. We believe future work could benefit from our implemen-tations.

21

7.1. FUTURE WORK CHAPTER 7. CONCLUSION

7.1 Future Work

It is very difficult to analyse a big dataset based on social values. Ourfindings may not be true for another dataset and there is no method that isguaranteed to work. To find such a method will take time. We would liketo end this paper with some questions that we think are highly relevant forany future work in this area.

• How many active tweeters, or tweets, from a country is needed topredict that country’s votes in the Eurovision Song Contest?

• Which part in our model and implementation is the weakest link? Isit data collection, data analysis or the visualization?

– Which part performs well?

• Would a prediction model like this perform better in a context with alower number of entities?

22

Bibliography[1] P.C. Guerra, W. Meira Jr., C. Cardie. Sentiment Analysis on Evolving

Social Streams: How Self-Report Imbalances Can Help, In proc. WSDMConference 2014, New York City.

[2] Go Programming Language, http://golang.org/

[3] Y. Hu, K. Talamadupula, S. Kambhampati. Dude, srsly?: The Surpris-ingly Formal Nature of Twitter’s Language, In proc. International AAAIConference on Weblogs and Social Media, 2013.

[4] J. Kim, J. Yoo, H. Lim, H. Qui, Z. Kozareva, A. Galstyan. SentimentPrediction Using Collaborative Filtering, In proc. International AAAIConference on Weblogs and Social Media, 2013.

[5] J. Lee, M. Sun, G. Lebanon. A Comparative Study of Collaborative Fil-tering, Technical Report (arXiv), May 2012.

[6] Anik. Mahanti, N. Carlsson, Anir. Mahanti, M. Arlitt, C. Williamson. ATale of the Tails: Power-laws in Internet Measurements, IEEE Network,Vol. 27, No. 1, Jan/Feb. 2013

[7] SentiStrength, http://sentistrength.wlv.ac.uk/

[8] G. Shirolkar, R. Shukla, H. Shah, R. Shah Mental State Classificationfor Hypnotherapy Using Sentiment Analysis, International Journal ofAdvanced research in Computer Science and Software Engineering, vol.3, iss. 10, October 2013.

[9] Y.R. Tausczik, J.W. Pennebaker. The Psychological Meaning of Words:LIWC and Computerized Text Analysis Methods, Journal of Languageand Social Psychology, March 2010; vol. 29, 1: p. 24-54.

[10] M. Thelwall, K. Buckley, G. Paltoglou. Sentiment In Twitter Events,University of Wolverhampton, 2011.

[11] M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, A. Kappas. SentimentStrength Detection in Short Informal Text, Journal of the American So-ciety for Information Science and Technology, 2010.

[12] Topsy, http://topsy.com/

[13] Twitter API, https://dev.twitter.com/docs/api

23

Appendix A

Entity Mentions

Country Code Entity Mentions

Poland pl Donatan & Cleo 29880Austria at Conchita Wurst 26697Spain es Ruth Lorenzo 26613Belarus by TEO 18918United Kingdom uk Molly 18672Ukraine ua Maria Yaremchuk 17004Finland fi Softengine 14138Russia ru Tolmachevy Sisters 13874Sweden se Sanna Nielsen 11691Italy it Emma Marrone 11645Switzerland ch Sebalter 11522Ireland ie Can-Linn feat. Kasey Smith 10188Greece gr Freaky Fortune feat. Risky Kidd 10125Hungary hu Andras Kallay-Saunders 8927Netherlands nl The Common Linnets 8797Armenia am Aram 8550Iceland is Pollaponk 7759San Marino sm Valentina Monetta 7432Malta mt Firelight 6551Israel il Mei Finegold 6503Latvia lv Aarzemnieki 6462Romania ro Paula and Ovi 6429Denmark dk Basim 6309France fr Twin Twin 6143Montenegro me Sergej Cetkovic 5778Norway no Carl Espen 5748Belgium be Axel Hirsoux 5497Azerbaijan az Dilara Kazimova 5173Estonia ee Tanja 4142Slovenia si Tinkara Kovac 4092Macedonia mk Tijana Dapcevic 3754Georgia ge The Shin & Mariko 3586Lithuania lt Vilija Mataciunaite 3401Moldova md Cristina Scarlat 3374Albania al Hersi 3083Portugal pt Susy 1960Germany de Elaiza 1876

24

Appendix B

Language Distribution

Language Tweet count

en 482262

es 90405

it 31179

ru 30578

fr 24972

de 12493

nl 11462

tr 8969

sv 7490

pl 5695

el 5047

bg 3398

da 3111

in 3051

fi 2820

sl 2793

pt 2477

sk 2419

hu 1405

uk 825

et 717

ja 715

lt 711

is 693

lv 575

no 513

ht 301

tl 279

vi 177

ko 48

hy 48

fa 43

zh 42

ar 32

iw 28

th 9

ur 8

ka 2

kn 1

25

Appendix C

Correlation Plots

0 5 10 15 20 25 300

5

10

15

20

25

30

Participants

Ran

k

SentiStrength (no filters)SimpleWords (no filters)MentionsReal outcome

0 5 10 15 20 25 300

5

10

15

20

25

30

Participants

Ran

k

SentiStrength (no filters)SimpleWords (no filters)MentionsReal outcome

Outcome without filters Telephone outcome without filters

0 5 10 15 20 25 300

5

10

15

20

25

30

Participants

Ran

k

SentiStrength (low−pass)SimpleWords (low−pass)MentionsReal outcome

0 5 10 15 20 25 300

5

10

15

20

25

30

Participants

Ran

k

SentiStrength (amp. negatives)SimpleWords (amp. negatives)MentionsReal outcome

Telephone outcome with low-pass filters Telephone outcome with amplified negatives

Figure C.1: Correlation plots

26

Appendix D

Results From Visualization

Figure D.1: Graph for heuristic Mention with no filter

Figure D.2: Graph for heuristic SentiStrength with no filter

Figure D.3: Graph for heuristic SimpleWords with no filter

27

APPENDIX D. RESULTS FROM VISUALIZATION

Figure D.4: Graph for heuristic SentiStrength with low pass filter

Figure D.5: Graph for heuristic SimpleWords with low pass filter

28

APPENDIX D. RESULTS FROM VISUALIZATION

Figure D.6: Graph for heuristic SentiStrength with amplify negatives filter

Figure D.7: Graph for heuristic SimpleWords with amplify negatives filter

29

APPENDIX D. RESULTS FROM VISUALIZATION

Pa svenska

Detta dokument halls tillgangligt pa Internet – eller dess framtida ersattare – underen langre tid fran publiceringsdatum under forutsattning att inga extra-ordinaraomstandigheter uppstar. Tillgang till dokumentet innebar tillstand for var ochen att lasa, ladda ner, skriva ut enstaka kopior for enskilt bruk och att anvandadet oforandrat for ickekommersiell forskning och for undervisning. Overforing avupphovsratten vid en senare tidpunkt kan inte upphava detta tillstand. All annananvandning av dokumentet kraver upphovsmannens medgivande. For att garan-tera aktheten, sakerheten och tillgangligheten finns det losningar av teknisk ochadministrativ art. Upphovsmannens ideella ratt innefattar ratt att bli namnd somupphovsman i den omfattning som god sed kraver vid anvandning av dokumentetpa ovan beskrivna satt samt skydd mot att dokumentet andras eller presenterasi sadan form eller i sadant sammanhang som ar krankande for upphovsmannenslitterara eller konstnarliga anseende eller egenart. For ytterligare information omLinkoping University Electronic Press se forlagets hemsida http://www.ep.liu.se/

In English

The publishers will keep this document online on the Internet - or its possible re-placement - for a considerable time from the date of publication barring exceptionalcircumstances. The online availability of the document implies a permanent per-mission for anyone to read, to download, to print out single copies for your own useand to use it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other uses ofthe document are conditional on the consent of the copyright owner. The publisherhas taken technical and administrative measures to assure authenticity, securityand accessibility. According to intellectual property law the author has the right tobe mentioned when his/her work is accessed as described above and to be protectedagainst infringement. For additional information about the Linkoping UniversityElectronic Press and its procedures for publication and for assurance of documentintegrity, please refer to its WWW home page: http://www.ep.liu.se/

c© Matteus Hemstrom och Anton Niklasson

30