sentirueval: testing object-oriented sentiment analysis ... · entity-oriented sentiment analysis...

25
SentiRuEval: Testing Object-Oriented Sentiment Analysis Systems in Russian N. Loukachevitch (Moscow), P. Blinov (Kirov), E. Kotelnikov (Kirov), Y. Rubtsova (Novosibirsk) V. Ivanov (Kazan), E. Tutubalina (Kazan)

Upload: others

Post on 11-Aug-2020

51 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

SentiRuEval: Testing Object-Oriented

Sentiment Analysis Systems

in Russian

N. Loukachevitch (Moscow), P. Blinov (Kirov),

E. Kotelnikov (Kirov), Y. Rubtsova

(Novosibirsk)

V. Ivanov (Kazan), E. Tutubalina (Kazan)

Page 2: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Entity-oriented sentiment analysis

• Sentiment analysis

– In general: sentiment of the whole document, fragment

or sentence

– Entity-oriented

• Sentiment about a specific entity

– Politician, political party

– Company etc.

• Sentiment about specific parts or properties of an

entity (aspects)

• Переходи в Билайн. «Все за 300» — отличный тариф!

• Previous Russian-oriented sentiment analysis

evaluations (2011-2013) concerned general sentiment

of a review or a news quotation

Page 3: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

SentiRuEval 2014-2015

• Testing of sentiment analysis systems of Russian texts

• Aspect-oriented analysis of reviews – Restaurants

– Cars

• Entity-Oriented analysis of tweets: reputation monitoring – Banks

– Telecom companies

Page 4: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

SentiRuEval: Analysis of reviews

• Tasks – Aspect terms extraction

– Sentiment towards aspect terms

– Determining categories of aspect terms

• Restaurants: food, interior, service, price, restaurant as a whole

• Cars: comfort, reliability, appearance, price, driveability, car as a whole

– Determining sentiments of categories for the whole review

• Data sets in each domain – Training collection - 200 reviews

– Test collection – 200 reviews

• Participants – 12 participants with 21 runs

Page 5: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Reviews: different types of lexical units

Друзья, давайте перестанем покорно принимать

курицу с сухарями залитую майонезом за

салат Цезарь. Я заказала Цезарь во "Временах

Года" и опять как и во многих других ресторанах,

мне принесли залитую майонезом кислятину под

названием "Салат Цезарь с куриным филе" за

280 руб.

Уважаемые повара, вам не стыдно??? Цезарь -

это салат с особым соусом с анчоусами ,очень

вкусный. В вашем заведении - столовская кухня

по ценам хорошего ресторана.

Page 6: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Aspects labeling • Types of aspects

– Explicit aspects denote some part or characteristics of a described object :

• staff, pasta, music in restaurant reviews.

• usually noun or noun groups

– Implicit aspects are single words or single words with sentiment operators that contain within themselves as specific sentiments as the clear indication to the aspect category

• tasty (positive+food), comfortable (positive+interior), not comfortable (negative+interior).

– Sentiment facts do not mention the user sentiment directly, formally they inform us only about a real fact, however, this fact conveys us a user’s sentiment as well as the aspect category it related to.

– отвечала на все вопросы (answered all questions)

– долго ждали (long waiting)

– знала меню (knew the menu)

– человеческий волос (human hair)

• May contain or not contain explicit aspects within themselves

Page 7: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Relevance of the term to the review

• Relevance of the term to the review: – Rel – relevant (to the current review),

– Cmpr – comparison, that is the term concerns another entity,

– We decided not to have dessert and coffee there, but instead went to another restaurant where we enjoyed a wonderful end to our evening.

– Prev – previous, that is the term is related to previous opinions,

• Приехали в новый ресторан Тао с мужем, в предвкушении чего-то необыкновенного, ожидания были таковы из-за прочитанных раннее отзывов, место описывалось, как магическое, а еда феерично-космическая

– Irr – irrealis, that is the term is the part of a recommendation or description of a desirable situation,

– Irn – irony.

Page 8: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Instrument for annotation: Brat

Page 9: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Aspect-oriented tasks

• Tasks: – A: automatic extraction of explicit aspects,

– B: automatic extraction of all aspects including sentiment facts,

– C: extraction of sentiments towards explicit aspects,

– D: automatic categorization of explicit aspects into aspect categories,

– E: sentiment analysis of the whole review on aspect categories

• Test data in xml – Several thousands of automatically labeled reviews conceal

reviews with correct aspects (Aspects block)

– Participants should write the extracted aspects to Aspect1 block

– Participants should categorize aspects in Aspects block

Page 10: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Format of test collection

Page 11: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Problems with annotation

• Aspect labeling

– Mentions in neutral contexts especially mentions of entities (restaurant, car)

– Usually maximal noun groups should be labeled, but: внешний вид автомобиля

• Aspect categorization

– по ходовой слабенькая машина - drivability

– красивая машина – appearance

– машина просто классная – as a whole

Page 12: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Extraction of explicit aspects: F1-measure

Domain Baseline Best result

Restaurants 0.608 0.632

Cars 0.594 0.676

Problems of automatic approaches • long noun groups with low frequencies

• “сытая хавронья" из свинины

• баклажаны, запеченные с сыром

• бекон на хрустящем тосте с помидором черри

• ambiguous verbs

• ели, поели

Best approaches

• Sequence labeling (SVM), distributional approaches, recurrent neural

nets

Page 13: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Sentiments towards aspects: macroF1

Domain Baseline

Most frequent

class

Best result

Restaurants 0.267 0.554

Cars 0.264 0.568

Leader: Gradient Boosting Classifier

Features: skip-gram model exploiting word contexts

for learning better vector representations and pointwise

mutual information

Page 14: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Aspects categorization

Domain Baseline

Most frequent

class

Best result

Restaurants 0.800 0.865

Cars 0.564 0.652

The best result:

•SVM with features based on pointwise mutual information

The second-place result:

•the method relying on the term similarity in the space of

distributed representations of words

Page 15: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

SentiRuEval: Reputation monitoring

• Reputation-oriented tweet may express

– positive or negative to a company

– positive or negative fact concerning a company

• Training collection

– 5000 banking tweets and 5000 telecom tweets

• Participation

– 10 participants

– 33 runs

Page 16: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Example of tweet and format of data

for labeling

• <table name="bank">

• <column name="id">71</column>

• <column name="twitid">492547326574360000</column>

• <column name="text">Сбербанк России не будет работать в

Крыму и Севастополе </column>

• <column name="sberbank">0</column>

• <column name="vtb">NULL</column>

• <column name="gazprom">NULL</column>

• <column name="alfabank">NULL</column>

• <column name="bankmoskvy">NULL</column>

• <column name="raiffeisen">NULL</column>

• <column name="uralsib">NULL</column>

• <column name="rshb">NULL</column>

Page 17: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Expert annotation

• Annotators should:

– leave “0” label for a mentioned entity unchanged if the

tweet was considered as neutral

– or replace the value with “1” (positive)

• Positive fact or opinion

– or “-1” (negative)

• Negative fact or opinion

• Annotators also could:

– label tweets with “--", which means =meaningless=,

– or with “+-”, which means positive and negative

sentiments in the same tweet.

– Both latter cases were excluded from evaluation.

Page 18: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Annotation problems

• Problems – Disagreement in sentiment labeling

• я сегодня ходил в сбербанк за картой, там оч милая девушка работала

– Multiple mistakes

• Test data were annotated using the voting scheme (3 annotators) – Agreement between 2 or 3 annotators

• Size of test collections – Banks – 4549 tweets of 5000 labeled

– Telecom – 3845 tweets of 5000 labeled

Page 19: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Performance measures

• Three-way classification of tweets: positive, negative or

neutral.

• Main quality measure: macro-average F-measure

– average value between

• F-measure of the positive class and

• F-measure of the negative class.

– ignored F-measure of neutral class

– this does not reduce the task to the two-class

prediction.

• Additionally micro-average F-measures were calculated

for two sentiment classes.

Page 20: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Results

• Manual labeling of participant for telecom domain

– Macro-F – 0.703, Micro-F – 0.7487

– Absolute maximum for automatic approaches

• Best results of automatic systems are far from manual results

• Best results:

– SVM+syntactic relations, – Linguistic syntax-based pattern (without machine learning)

– Maxent, SVM using various features

Domain Macro-F Micro-F

Banking 0.3598 0.3656

Telecom 0.4882

0.5362

Page 21: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Why best results in two domains

are so different

• Best results in banking and telecom domains are so

different: 0.36 vs. 0.488

• Difference between training and test collections: Kullback-

Leibler divergence

Page 22: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Problems of reputation analysis of tweets

• In any moment some events influencing

reputation can occur =>absence in training data

• In our case

• - training collections in both domains

• during July-August 2014 after Ukraine events

2013-2014

– Sanctions against banks

– Problems with communication in Crimea (in less extent)

• - test collections

• December 2013-February 2014

– Ukraine events did not influence target entities

Page 23: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Most difficult tweets: almost all systems

made mistakes

1. Training collection does not contain words from test collection

– Самый безалаберный банк по отношению к клиентам - Сбербанк

– В столице произошло дерзкое ограбление Сбербанка

– Гребаный сбербанк

2. Really difficult tweets: irony and sarcasm, comparisons – Сбербанк России – лучший в мире производитель

пластиковых карточек для отскабливания льда от автомобиля

– Нормально @sberbank зарабатывает - размен 5% от суммы

• Great difference between training and test collections in the banking domain=>

– 30% of tweets could be better classified if the approaches have general sentiment dictionaries

Page 24: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

If systems were really entity-oriented

• Test tweets mentioning two or more entities

– 58 tweets in the banking domain (15 tweets with different polarity

labels),

– 232 tweets in the telecom domain (71 tweets with different polarity

labels).

• Only three of nine participants considered the task as

entity-oriented one

– Other participants always assigned the same polarity class to all

entities mentioned in a tweet.

• Performance

– Worse than for all tweets on average

– Entity-oriented approaches did not achieve better results

Page 25: SentiRuEval: Testing Object-Oriented Sentiment Analysis ... · Entity-oriented sentiment analysis • Sentiment analysis –In general: sentiment of the whole document, fragment or

Conclusion

• We described the tasks, approaches and results

in SentiRuEval testing

– Aspect-oriented analysis of reviews in two domains

– Reputation-oriented analysis of tweets

• All prepared materials are accessible for research

purposes (see hyperlinks in the paper)

• Reviews conclusions

– Most efforts were directed to aspect extraction

– Less attention to other tasks

• Tweet task conclusions

– High dependence from training collections

– Capability to do entity-oriented analysis is quite restricted

• Both tasks (or some variants) should be repeated?