deep learning for natural language sentiment and...

85
Deep Learning for Natural Language Sentiment and Affect Muhammad Abdul-Mageed The University of British Columbia [email protected] (Abdul-Mageed & Kralj Novak, 2018) Petra Kralj Novak Jožef Stefan Institute [email protected]

Upload: others

Post on 01-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Deep Learning for Natural Language

Sentiment and Affect

Muhammad Abdul-Mageed

The University of British Columbia

[email protected]

(Abdul-Mageed & Kralj Novak, 2018)

Petra Kralj Novak

Jožef Stefan Institute

[email protected]

Page 2: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Outline

• Introduction

• Classical Methods

• Deep Learning Methods – on separate slides

• Multilingual Approaches

• Resources

• Ethics

2

Page 3: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Introduction

3

Page 4: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

How far can we go with machines?

4

Page 5: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Information Overload

5https://beta.techcrunch.com/2017/06/27/facebook-2-billion-users/

Page 6: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

https://marketinsight.gkfx.com/

Financial Markets

6

Page 7: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Sentiment analysis (broad definition)

• Sentiment analysis and opinion mining is the field of study that analyzes people’s

• opinions,

• sentiments,

• evaluations,

• attitudes, and

• emotions

from written language.

7

Page 8: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Sentiment analysis (narrow) definition

• Sentiment Analysis is the process of computationally determining whether a piece of text is positive, neutral or negative.

• Sentiment polarity & subjectivity

8

Page 9: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Sentiment analysis (narrow) definition

• Sentiment Analysis is the process of computationally determining whether a piece of text is positive, neutral or negative.

• Sentiment polarity & subjectivity

9

Page 10: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Examples

• Opposite orientations in different application domains• “This camera sucks.” • “This vacuum cleaner really sucks.”

• Sarcasm:• “What a great car! It stopped working in two days.”

• Opinions without sentiment words• “This washer uses a lot of water.”

• Ambiguous• “It is my birthday today.”

• Language specific• “Na ECML komferenčni večerji smo se zabavali ob čudoviti glasbi in plesu.”

10

Page 11: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Granularity level

• Word• Sentence, paragraph• Document

11

Page 12: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Classical methods

12

Page 13: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Sentiment lexicons

• Good, wonderful, amazing

• Bad, poor, terrible• Cost someone an arm and a leg

13

Page 14: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Lexical sentiment analysis Loughran and McDonald Sentiment Word Lists

14

Page 15: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Lexical sentiment analysis of mainstream news: Bitcoin

15http://newstream.ijs.si/

Page 16: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Lexical vs. machine learning methods

Lexical Machine learning

Maite Taboada, Sentiment Analysis: An Overview from Linguistics. Annual Review of Linguistics 2016 2:1, 325-347 16

Page 17: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Stance analysis

• Stance detection is the task of automatically determining whether the author of the text is in favor of, neutral or against towards a target

• Example:• Target: legalization of abortion

• Tweet: ”A fetus has rights too! Make your voice heard.”

17

Page 18: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Slovenian presidential elections 2012

• Stance analysis on manually annotated Twitter data: • Tweets annotated if it is in favor of, neutral or against each of the

candidates

• Linear kernel SVM model

18

Page 19: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Affect: Emotion is Pervasive

[Credit: www.uwtsd.ac.uk]

19

Page 20: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

[Credit: https://www.youtube.com/watch?v=Ixkp0T3-1YE]

Emotion in Public Discourse

20

Page 21: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Source: https://www.theatlantic.com/health/archive/2015/02/hard-feelings-sciences-struggle-to-define-emotions/385711

Hard Feelings: Science’s Struggle to

Define Emotions

21

Page 22: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

What is emotion?

• “[E]veryone knows what an emotion is, until asked to give a definition. Then, it seems, none knows” (Fehr & Russel, 1984)

• Definitions vary as a function of:• discipline or approach

• time or culture

• ~ 100 definitions of emotion (Kleinginna & Kleinginna, 1984)

22

Page 23: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Models of emotion

• Categorical models of basic emotion

(e.g., Matsumoto & Ekman, 2009; Panksepp, 2005)

• Bidimensional models

(e.g., Russel, 2009)

• Appraisal models

(e.g., Arnold, 1950; 1960; Lazarus, 1991; Scherer et al., 2001)

• Other…

23

Page 24: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Basic emotion models

• Categorical models (e.g., Matsumoto & Ekman, 2009; Panksepp, 2005)

anger, disgust, fear,

joy, sadness, surprise

24

Page 25: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Bidimensional Models arousal

valence

aroused

sleepy

pleasedfrustrated

25

Page 26: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Bidimensional Models arousal

valence

26

Page 27: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Plutchik Wheel of Emotions

27

Page 28: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

3 Circles of Arousal

Core, Primary, and

Secondary (p1, p2, p3)

8 dimensions

28

Page 29: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Arousal

29

Page 30: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

2 Dimensions of Valence

30

Page 31: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Learning emotion

• Multiclass classification task

• Similar to learning sentiment (text classification)

31

Page 32: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

The sentiment analysis pipeline

32

Page 33: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

The sentiment analysis pipeline

Millions of documents

Thousands of documents classifier

1 2 3

5Millions of documents

4

33

Page 34: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Data acquisition and labeling

• Acquisition: Relevant data

• Annotation: • Representative sample

• Sample size: 20 – 100K

• Duplicates

• Annotators• Clear instructions with examples

• Annotator self-agreement

• Inter-annotator agreement

Zollo, F., Novak, P.K., Del Vicario, M., Bessi, A., Mozetič, I., Scala, A., Caldarelli, G. and Quattrociocchi, W., 2015. Emotional dynamics in the age of misinformation. PloS one, 10(9), p.e0138740. 34

Page 35: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Size of training dataset: saturation pointMonitor classifier performance while feeding increasingly larger training sets

Inter-annotator agreement Classifier performance

Saturation point not reached at 90,000 tweets Saturation point at 70,000 tweets

Mozetič, I., Grčar, M. and Smailović, J., 2016. Multilingual Twitter sentiment classification: The role of human annotators. PloS one, 11(5), p.e0155036. 35

Page 36: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

The role of human sentiment annotators

Comparison of annotators self-agreement, the inter-annotator agreement, and an automated sentiment classifier in terms of Krippendorff’s Alpha.

Mozetič, I., Grčar, M. and Smailović, J., 2016. Multilingual Twitter sentiment classification: The role of human annotators. PloS one, 11(5), p.e0155036. 36

Page 37: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Distant supervision

To build a dataset

• Emoticon/emoji

• #tags

• Seed words (good, bad)

Remove the hints while training

37

Page 38: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

The sentiment analysis pipeline

Millions of documents

Thousands of documents classifier

1 2 3

5Millions of documents

4

38

Page 39: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

2, 3 or more class problem?

• 2-class problem• Whether a review posted online (of a movie, a book, or a consumer product)

is positive or negative towards the item being reviewed

• 3-class problem• Whether the sentiment of the text is positive, neutral or negative

• More-class problem• Emotion detection

39

Page 40: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Exercise: confusion matrix of a classifier

40

Page 41: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Exercise: confusion matrix of a classifier

• Accuracy = 80% in both cases

• The errors in the first matrix are heavier then in the second

41

Page 42: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Problem formulation: Ordinal regression

• Three class problem: negative, neutral, positive

• Error from positive to negative is bigger then the error from positive to neutral

42

Page 43: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Problem formulation: Ordinal regression

• Three class problem: negative, neutral, positive

• Error from positive to negative is bigger then the error from positive to neutral

• Measures of quality:• Accuracy, Accuracy@1

• f1

• MAE, MSE

• Choen’s Kappa

• Krippendorff’s Alpha

43

Page 44: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Exercise: confusion matrix of a classifier

• Accuracy = 80%

• F1 = 0.71

• Accuracy = 80%

• F1 = 0.83

44

Page 45: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

The sentiment analysis pipeline

Millions of documents

Thousands of documents classifier

1 2 3

5Millions of documents

4

45

Page 46: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Classifier

Traditional approaches: SVM, Naïve Bayes

Neural networks

46

Page 47: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Data representation 1: BOW

• Each word is one dimension

• Each document is one point on a hypersphere

47

Page 48: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Social media specific sentiment features

48

Page 49: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Data representation: Additional features

• BOW bag of words + additional features• Word N-grams: (Justin Bieber, video games, not happy)

• Punctuation:

• Emoticons and emoji:

• Preprocessing: baaaaaaad → baaad

• Capitalization: SCREAMING

• Language specific • Lists of positive and negative words: SentiWordNet

• Spellings of swearing: f**k

• Language (keyboard) specific emoticons: ಠ_ಠ , ƸӜƷ

49

Page 50: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Precision-recall tuning

• Precision & Recall should be similar for both the positive and the negative class

50

Page 51: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Deep learning methods

51

Page 52: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Multilingual sentiment analysis• Lo, S.L., Cambria, E., Chiong, R. and Cornforth, D., 2017. Multilingual sentiment analysis: from formal

to informal and scarce resource languages. Artificial Intelligence Review, 48(4), pp.499-527.

• Korayem, M., Aljadda, K. and Crandall, D., 2016. Sentiment/subjectivity analysis survey for languages other than English. Social network analysis and mining, 6(1), p.75.

52

Page 53: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

NLP != English LP

53Image from https://fledu.uz

Languages in the world Languages on Twitter

Page 54: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Multilingual sentiment analysis approaches

A. Translation-based sentiment AnalysisB. Corpus basedC. Lexicon-based sentiment analysis D. Machine learning approachesE. Language independent approaches

54

Page 55: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Translation based sentiment analysis (2)

Original documentEnglish document Sentiment classification

Machine

translation

Apply English

sentiment

analysis

55

Page 56: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Translation based sentiment analysis (2)

Sentiment labeled corpus

(English)

Machine translate to

target language

Corpus in target language

Build a ML model

Original document

Sentiment model for

target language

Sentiment classification56

Page 57: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Corpus based

Parallel corpora

Apply “English”

sentiment model

Transfer

labels

Build sentiment

model for target

language

Sentiment model for

target language

57

Page 58: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Lexicon-based sentiment analysis

• Build a sentiment lexicon for target language• Translation of lexica (+ check 10.000 most frequent words)

• Word net (words and semantic relations) + seed words

58

Page 59: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Machine learning approaches

1. Labeled dataset• Manual annotation

• Distant supervision• Emoji/emoticon

• Positive and negative #tags

• Seed words

2. Build a machine learning model

59

Page 60: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Languages of rich morphology

60

Page 61: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

(Abdul-Mageed, 2018)

Arabic

61

Page 62: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

(Abdul-Mageed, 2018)62

Page 63: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

(Abdul-Mageed, 2018)63

Page 64: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

(Abdul-Mageed, 2018)64

Page 65: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

(Abdul-Mageed, 2018)

Segmentation

65

Page 66: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

(Abdul-Mageed, 2018)

POS Tagging

66

Page 67: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

(Abdul-Mageed, 2018)

ASMA: Segmentation &

Morphosyntactic Disambiguation

ASMA: A Real-World Example

67

Page 68: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

(Abdul-Mageed, 2018)

Modeling in lexical space

Modeling in morphosyntactic space

(Abdul-Mageed, 2015. Dissertation)68

Page 69: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

(Abdul-Mageed, 2018)69

Page 70: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Resources & Venues

70

Page 71: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Sentiment Resources

• Lexicons

• Models & libraries

• Annotated sentiment data

71

Page 73: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

AFINN

• A list of English words rated for valence

• Scale [-5,5]

• 2477 words and phrases

• Licence: Open Database License (ODbL) v1.0

• An evaluation of the word list is available in:Finn Årup Nielsen"A new ANEW: Evaluation of a word list for sentiment analysis in microblogs",Proceedings of the ESWC2011 Workshop on 'Making Sense of Microposts':Big things come in small packages 718 in CEUR Workshop Proceedings : 93-98. 2011 May.http://arxiv.org/abs/1103.2903

73

Page 74: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Emoji sentiment ranking

• Sentiment of 751 (most common) emojis

• Constructed from manually sentiment labeled 75,000 tweets with emoji in 13 European languages

• Similar format to SentiWordNet

• Kralj Novak P, Smailović J, Sluban B, Mozetič I (2015) Sentiment of Emojis. PLoS ONE 10(12): e0144296. https://doi.org/10.1371/journal.pone.0144296

• http://kt.ijs.si/data/Emoji_sentiment_ranking/

74

Page 75: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Models

• TextBlob• PatternAnalyzer: based on a lexicon of adjectives

• NaiveBayesAnalyzer: a NLTK classifier trained on a movie reviews corpus

• (Python) https://textblob.readthedocs.io/en/dev/

• Ipubila sentiment analysis • English, German, French and Italian.

• (Python, REST) https://github.com/ipublia/sentiment-analysis

75

Page 76: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Annotated sentiment data

• Twitter sentiment for 15 European languages (1,643,735 manually annotated tweets)

• SemEval competition data• Bing Liu’s customer reviews and other datasets• Product reviews: this dataset consists of a few million Amazon customer reviews with

star ratings, super useful for training a sentiment analysis model.• Restaurant reviews: this dataset consists of 5,2 million Yelp reviews with star ratings.• Movie reviews: this dataset consists of 1,000 positive and 1,000 negative processed

reviews. It also provides 5,331 positive and 5,331 negative processed sentences / snippets.

• Fine food reviews: this dataset consists of ~500,000 food reviews from Amazon. It includes product and user information, ratings, and a plain text version of every review.

• Twitter airline sentiment on Kaggle: this dataset consists of ~15,000 labeled tweets (positive, neutral, and negative) about airlines.

• First GOP Debate Twitter Sentiment: this dataset consists of ~14,000 labeled tweets (positive, neutral, and negative) about the first GOP debate in 2016.

76

Page 77: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Emotion Resources

• Lexicons• NRC emotion lexicon• UBC emotion lexicon (ongoing work)

• Data

• SemEval 2007; 2018; 2019

• Aman and Szpakowicz (2007)

• Abdul-Mageed and Ungar (2017)

• Alhuzali, Abdul-Mageed, and Ungar (2018) (Arabic)

77

Page 78: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Biases & Ethics

78

Page 79: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Biases: Social media data is not representative

• Demographic differences between social media users and “target population”

• Behaviour biases

• Linking biases

• Temporal variations

79

Page 80: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Ethics

• Types of social media research

• Users publishing content might have not anticipated a particular use

Aware Not aware

Manipulated Lab studies A/B testing

Not manipulated Opt-in study Observational studiesSentiment

analysis

80

Page 81: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Ethics

• Private or public?• PRIVATE: a password protected ‘private’ Facebook group

• PUBLIC: an open discussion on Twitter in which people broadcast their opinions using a #tag (in order to associate their thoughts on a subject with others’ thoughts on the same subject)

• Public != Non-sensitive

Townsend, L. and Wallace, C., 2016. Social media research: A guide to ethics. University of

Aberdeen, pp.1-16.

81

Page 82: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Ethics: Case Study - Marihuana

• Twitter: #cannabis, #legalize, #ismokeit

• Concerns: • Sensitive: illegal activity

• May be users under the age of 18

• Solution:• Present results from aggregate data,

• Avoid compromising anonymity: paraphrased quotes (removing ID handles)

• Direct quotes may be used with informed consent from the platform (over 18) user.

82

Page 83: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Take-home messages

• On real data, human annotators disagree → hard problem

• The best classifier can not outperform the inter-annotator agreement

• Data representation• BOW + Social media specific features: punctuation, emojis, …

• Embedding + deep learning: need lots of data (unlabeled, distant supervision)

• NLP != English LP

83

Page 84: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

ReferencesINCOMPLETE• Liu B. Sentiment analysis: mining opinions, sentiments, and emotions. The Cambridge University Press, 2015.

• Zhang, L., Wang, S., & Liu, B. (2018). Deep Learning for Sentiment Analysis: A Survey. arXiv preprint arXiv:1801.07883.

• Mohammad, S. M. Challenges in sentiment analysis. In A Practical Guide to Sentiment Analysis (pp. 61-83). Springer, Cham, 2017.

• Taboada, M. Sentiment Analysis: An Overview from Linguistics. Annual Review of Linguistics 2016 2:1, 325-347

• Abdul-Mageed, M. and Ungar, L., 2017. Emonet: Fine-grained emotion detection with gated recurrent neural networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 718-728).

• Mozetič, I., Grčar, M. and Smailović, J., 2016. Multilingual Twitter sentiment classification: The role of human annotators. PloS one, 11(5), p.e0155036.

• Zollo, F., Novak, P.K., Del Vicario, M., Bessi, A., Mozetič, I., Scala, A., Caldarelli, G. and Quattrociocchi, W., 2015. Emotional dynamics in the age of misinformation. PloS one, 10(9), p.e0138740.

• Zollo, F., Sluban, B., Mozetič, I. and Quattrociocchi, W., 2017, November. Toward a Better Understanding of Emotional Dynamics on Facebook. In International Workshop on Complex Networks and their Applications (pp. 365-377). Springer, Cham.

• Kralj Novak, P. , Smailović, J., Sluban, B., & Mozetič, I. (2015). Sentiment of emojis. PloS one, 10(12), e0144296.

Multilingual:

• Lo, S.L., Cambria, E., Chiong, R. and Cornforth, D., 2017. Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artificial Intelligence Review, 48(4), pp.499-527.

• Korayem, M., Aljadda, K. and Crandall, D., 2016. Sentiment/subjectivity analysis survey for languages other than English. Social network analysis and mining, 6(1), p.75.

• Abdul-Mageed, M., Diab, M. and Kübler, S., 2014. SAMAR: Subjectivity and sentiment analysis for Arabic social media. Computer Speech & Language, 28(1), pp.20-37.

Ethics:

• Townsend, L. and Wallace, C., 2016. Social media research: A guide to ethics. University of Aberdeen, pp.1-16.

84

Page 85: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer

Muhammad Abdul-Mageed

Natural Language Processing Lab

School of Information

The University of British Columbia

Vancouver, Canada

[email protected]

(Abdul-Mageed & Kralj Novak, 2018)

Petra Kralj Novak

Department of Knowledge Technologies

Jožef Stefan Institute

Ljubljana, Slovenia

[email protected]

@PetraKraljNovak