sentichenews - sentiment analysis on newspapers and tweets

21
SentiCheNews A tool for analyzing possible relationships between news and tweet sentiments Data Mining Class Sapienza, University of Rome A. Y. 2016 - 2017

Upload: manuel-coppotelli

Post on 21-Jan-2018

60 views

Category:

Education


0 download

TRANSCRIPT

Page 1: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

SentiCheNewsA tool for analyzing possible relationships between news and tweet sentiments

Data Mining Class

Sapienza, University of Rome

A. Y. 2016 - 2017

Page 2: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Hi!

Simone [email protected]

https://it.linkedin.com/in/simone-santacroce-272739134

Manuel [email protected]

https://it.linkedin.com/in/manuelcoppotelli

George Adrian [email protected]

https://it.linkedin.com/in/george-adrian-munteanu-707744134

SentiCheNews

Page 3: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Agenda

1 To begin• Sentiment Analysis: what is it?• Our goals• A good lexicon• Dictionary Structure

2 Data Collection & Preprocessing• Collecting Data• Preprocessing• Design Choices

3 Results Analysis• Dashboard• Analysis

SentiCheNews

Page 4: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Agenda

1 To begin• Sentiment Analysis: what is it?• Our goals• A good lexicon• Dictionary Structure

2 Data Collection & Preprocessing• Collecting Data• Preprocessing• Design Choices

3 Results Analysis• Dashboard• Analysis

SentiCheNews

Page 5: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Sentiment Analysis. . .

. . . refers to the use of:

• natural language processing

• text analysis

• computational linguistic

to identify and capture subjectiveinformation from source materials(news, social media, reviews...)

SentiCheNews

Page 6: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Our goals

Given a collection of italian news and italian tweets within the sametime period...is there any connection between them? In particular:

• do newspapers and tweets report the same sentiment for a certainday (a sort of influence of the news on the tweets)?

• what is the newspaper whose average feeling is closer to theaverage of tweets feeling?

• are there any differences among newspapers’ sentiments?

• the variance in time for each newspaper

SentiCheNews

Page 7: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

A good lexicon

Sentiment Analysis for the englishlanguage is:

• a well studied problem, therefore

• there are a lot of excellent lexiconsready to use

This is not true for the italian language:WE HAD TO BUILD OUR OWNDICTIONARY starting from an englishdictionary available at:http://sentiwordnet.isti.cnr.it

SentiCheNews

Page 8: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Dictionary Structure 1/2

Each row of the dictionary is representedby a tuple: < s, (p, n) > where

• s: string

• p: positive score

• n: negative score

Therefore, each string is represented by apositive and a negative score.

SentiCheNews

Page 9: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Dictionary Structure 2/2

Given a tuple t: < s, (p, n) >

String s can be composed by a single word or up to four words,separated by underscore.

E.g.

• tuple x : < a, (p,n)>

• tuple y : < a b, (p’,n’)>

• . . .

• tuple y : < a b c d, (p”,n”)>

SentiCheNews

Page 10: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Agenda

1 To begin• Sentiment Analysis: what is it?• Our goals• A good lexicon• Dictionary Structure

2 Data Collection & Preprocessing• Collecting Data• Preprocessing• Design Choices

3 Results Analysis• Dashboard• Analysis

SentiCheNews

Page 11: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Collecting Data

Tweets

• Step 1: getting Twitter APIkeys

• Step 2: connecting to TwitterStreaming API

• Step 3: for each tweet savetext and date

News

• We exploit the RSS Feed andfor each of them we save:

• date

• title

• newspaper source

SentiCheNews

Page 12: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Preprocessing

Tweets and news are preprocessed with the following techniques:

• stop-word removal

• normalization (lower case, accents, etc)

• stemming (but we realize that...)

SentiCheNews

Page 13: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Design Choices: stemming operation

Stemming operation upon different words may produce the sameresult.

E.g.

• ’amaro’

• ’amare’

have both the same root ’amar’ whereas they have an entirelydifferent meaning and different (positive, negative) values.

We do not apply the stemming preprocessing operation.

SentiCheNews

Page 14: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Design Choices: string scoring

Given a string s (either a news or a tweet) we exploit as efficient aspossible the dictionary’s structure to assign a score to s.

In particular reason by four tokens at a time.

SentiCheNews

Page 15: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Agenda

1 To begin• Sentiment Analysis: what is it?• Our goals• A good lexicon• Dictionary Structure

2 Data Collection & Preprocessing• Collecting Data• Preprocessing• Design Choices

3 Results Analysis• Dashboard• Analysis

SentiCheNews

Page 16: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Dashboard: spotting the results

SentiCheNews

Page 17: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Mean & Variance

Each colored bubblerepresents a datasource, news ortweets, where:

• center representsthe sentiments’mean

• radius representsthe variance ofsentiments

SentiCheNews

Page 18: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

What’s inside each bubble?

A point for eachnews/tweet. Eachpoint is representedby a tuple <p,n,t>

• p: positive score

• n: negative score

• t: time

SentiCheNews

Page 19: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Sentiments’ trend per time interval (mean)

Given time interval [t1, t2] there is a mean sentiment bubble every 6hours.

SentiCheNews

Page 20: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Sentiments’ trend per time interval (variance)

SentiCheNews

Page 21: SentiCheNews - Sentiment Analysis on Newspapers and Tweets

To begin Data Collection & Preprocessing Results Analysis

Thank you for your attention

All the material can be found at:

GitHub Repository

https://github.com/manuelcoppotelli/SentiCheNews

SentiCheNews