emotiblog: a finer-grained and more precise learning of subjectivity expression models (publicado...

34
EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester Boldrini & Patricio Martínez-Barco, Alexandra Balahur & Andrés Montoyo 1

Upload: valencia-armenta

Post on 18-Apr-2015

5 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

1

EmotiBlog: a finer-grained and more precise learning of subjectivity expression models

(Publicado en: Linguistic Annotation Workshop –ACL- 2010)

Ester Boldrini & Patricio Martínez-Barco,

Alexandra Balahur & Andrés Montoyo

Page 2: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

2

EmotiBlog …otras del modelo

Boldrini, E., Balahur, A., Martínez-Barco, P., Montoyo, A. 2009. EmotiBlog: an Annotation Scheme for Emotion Detection and Analysis in Non-traditional Textual Genre. The 2009 World Congress in Computer Science, Computer Engineering, and Applied Computing.

Boldrini, E., Balahur, A., Martínez-Barco, P., Montoyo, A. 2009. EmotiBlog: a fine-grained annotation schema for labelling subjectivity in the new-textual genres born with the Web 2.0. SEPLN.

Page 3: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

3

Por qué

Información subjetiva

◦ Compleja de extraer y clasificar basándose en reglas

◦ Espontánea◦ Nuevas maneras de expresión

(coloquialismos, frases hechas, colocaciones, anomalías en puntuación, etc)

◦ Grande variabilidad semántica

Page 4: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

4

Por qué

Recuperar la info en blogs es complejo:

◦ Detección de los objetos del discurso◦ Su clasificación de polaridad◦ Determinación de los interlocutores y si la opinión

expresada es de este tópico o se refiere a algo previamente dicho

DATOS ANOTADOS

(para el entrenamiento de sistemas de ML)

Page 5: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

5

Qué mejoro

Contribuir a la creación de corpus que son escasos en más lenguas que el inglés

◦ mejorar los que están en inglés

Creación de un modelo de anotación más detallado

Incluye la anotación de la fuente

◦ referencias anafóricas a nivel de cross-document

Page 6: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

LAW IV - ACL 2010, July 15 6

EmotiBlogel Corpus

Corpus en 3 lenguas: italiano español inglés

Sobre 3 temas: Protocolo de Kyoto Elecciones en Zimbabwe Elecciones en EEUU

30.000 palabras para cada lengua y temaExtraído de entradas de blogs

Page 7: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

EmotiBlogel esquema

<? xml version="1.0"?><schema xmlns="http://www.w3.org/2001/XMLSchema"><element name="noun"><complexType> <attribute name="source" type="kind" use="required"/>

<attribute name="target" type="kind" use="required"/> <attribute name="polarity"> <simpleType>

<restriction base="string"> <enumeration value="positive" /> <enumeration value="negative" /> </restriction></simpleType>

</attribute> <attribute name="degree" default="medium">

<simpleType> <restriction base="string"> <enumeration value="low" /> <enumeration value="medium" /> <enumeration value="high" /> </restriction></simpleType>

</attribute> <attribute name="phenomenon">

<

Page 8: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

8

EmotiBlogel esquema

Obj. speech Subj. speech Adjectives Adverbs Verbs Anaphora Capital letter Punctuation Names Phenomenon Reader Interpretation Author Interpretation Emotions

Page 9: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

9

EmotiBlogel esquema

Obj. speech Subj. speech Adjectives Adverbs Verbs Anaphora Capital letter Punctuation Names Phenomenon Reader Interpretation Author Interpretation Emotions

Confidence, comment, source, target.

<objective speech target="Bush" category="phrase“> George Bush was one of the United States presidents </objective speech>

Page 10: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

10

EmotiBlogel esquema

Obj. speech Subj. speech Adjectives Adverbs Verbs Anaphora Capital letter Punctuation Names Phenomenon Reader Interpretation Author Interpretation Emotions

Confidence, comment, category, polarity, degree, source, target, emotion

<phenomenon target="Kyoto Protocol" category="phrase" degree=“high" source="w" polarity="positive" emotion=“joy“>This is a great initiative. </phenomenon>

Page 11: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

11

EmotiBlogel esquema

Obj. speech Subj. speech Adjectives Adverbs Verbs Anaphora Capital letter Punctuation Names Phenomenon Reader Interpretation Author Interpretation Emotions

Confidence, comment, category, polarity, degree, source, target, emotion

<reader_int target=“Bush" category="phrase" degree=“high" source="w" polarity=“negative" emotion=“anger“>Bush repeatedly refused to sign the Kyoto Protocol. </reader_int>

Page 12: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

12

EmotiBlogel esquema

Obj. speech Subj. speech Adjectives Adverbs Verbs Anaphora Capital letter Punctuation Names Phenomenon Reader Interpretation Author Interpretation Emotions

Confidence, comment, polarity, degree, source, target, emotion

This <author_int target="Kyoto Protocol" category="phrase" degree=“high" source="w" polarity="positive" emotion="good“>pack of wolves </author_int>

Page 13: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

13

ExperimentosEmotiBlog: anota palabras individuales+

expresiones multipalabra + frasePolaridad+ intensidad + emoción

◦ experiments show how the annotated elements can be used as training for the opinion mining and polarity classification task and for emotion detection.

EmotiBlog anota la intensidad de los elementos anotados◦ Realizamos un breve experimento para determinar

la intensidad de las emociones expresadas: alta, media, baja

Page 14: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

14

ExperimentosCorpus empleados

La coleccción del JRC (discurso indirecto) de periódicos (Balahur et al., 2010), eniquecido con la anotación de EmotiBlog◦ http://langtech.jrc.it//JRC_Resources.html

La colección de titulares de periódicos del SemEval 2007 tarea14 – Affective Text

ISEAR copus: a corpus of self-reported emotional response (Scherer and Walbott, 1999).

Page 15: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

15

ExperimentosCreación de los modelos de entrenamiento

Extraemos las NEsParseamos los datos anotados (frases) Para cada palabra de la frase:

◦POS ◦Capitalization◦Opinionatedness /intensity◦Syntactic relatedness with other opinion word ◦Polarity/intensity and emotion of this word◦Role in 2-word, 3-word and 4-word

annotations: opinionatedness, intensity, emotion, direct dep.

Page 16: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

16

Primer modelo– EmotiBlog I:◦Vector de características en cada frase◦Weka SVM SMO

Segundo modelo- EmotiBlog II:◦Añadiendo la colección de las palabras

de opinión/emocióm anotadas en EmotiBlog: Opinion Finder MicroWordNet General Inquirer WordNet Affect

ExperimentsCreación de los modelos de entrenamiento

Page 17: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

17

ExperimentosEvaluación de los modelos

2 evaluaciones:◦ Polaridad and intensidad:

Emotiblog I and II modelos 2 set de test– the JRC y SemEval 2007

Task Number 14 test set

◦ Detección de emoción Emotiblog I and II modelos 3 set de test – JRC, SemEval 2007 Task

Number 14 test set, ISEAR

Page 18: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

18

ExperimentosClasificación de polaridad e intensidad

Test Corpus

Evaluationtype

Precision Recall

JRC quotes I

Polarity 32.13 54.09Intensity 36.00 53.2

JRC quotes II

Polarity 36.4 51.00Intensity 38.7 57.81

SemEval I Polarity 38.57 51.3Intensity 37.39 50.9

SemEval II Polarity 35.8 58.68Intensity 32.3 50.4

Best SE Polarity 31.18 66.38

Page 19: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

19

3 corpus:◦ JRC (anotado con EmotiBlog)◦ SemEval 2007 Task No.14 test set (anotado con

una pequeña colección de emociones)◦ ISEAR (anotado con una pequeña colección de

emociones)Verificar el renidiemto del sistema

usando anotación general y más detallada◦especifico para EmotiBlog

ExperimentosClasificación de la emoción

Page 20: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

20

ExperimentosClasificación de la emoción

Test corpus

Evaluation type

Precision Recall

JRC quotes I

Emotions 24.7 15.08

JRC quotes II

Emotions 33.65 18.98

SemEval I Emotions 29.03 18.89

SemEval II Emotions 32.98 18.45

ISEAR I Emotions 22.31 15.01

ISEAR II Emotions 25.62 17.83

BEST SE Emotions 16.23 26.27

Page 21: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

21

ExperimentosClasificación de la emoción

Mejores resultados en“anger”◦ Precisión 35% y recall of 19%.

Peores resultados para la categoría “shame” en el ISEAR◦ precisión 12 %, y recall of 15%.

Los textos sacados de News obtienen mejores resultados◦ En ISEAR la emoción es más escondida

Nuestra aproximación: robusta para distintos géneros textuales y relevante

para OM

Page 22: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

22

EmotiBlog el acuerdo

Inter-annotator agreement usando agr (Sp)

Page 23: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

23

EmotiBlog en QA

Balahur, A., Boldrini, E., Montoyo, A., Martínez-Barco, P. 2009. Opinion and Generic Question Answering systems: a performance analysis. To appear in Proceedings of ACL, 2009, Singapore.

  Balahur, A., Boldrini, E., Montoyo, A., Martínez-Barco, P. 2009. Opinion

Question Answering: Towards a Unified Approach. To appear in proceedings of the ECAI conference.

  Balahur, A., Boldrini, E., Montoyo, A., Martínez-Barco, P. 2009. A Unified

Proposal for Factoid and Opinionated Question Answering. To appear in proceedings of the COLING conference.

Balahur, A., Boldrini, E., Montoyo, A., Martínez-Barco, P. 2009. A Comparative Study of Open Domain and Opinion Question Answering Systems for Factual and Opinionated Queries. To appear in Proceedings of RANLP 2009.

Balahur, A., Boldrini, E., Montoyo, A., Martínez-Barco, P. 2009. Towards the Definition of Requirements for Mixed Fact and Opinion Question Answering Systems. In Proceedings of Topic Semantic Analysis. CIKM 2009.

Page 24: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

24

EmotiBlog en QA

Técnicas para el buen rendimiento de QA pero con contenido subjetivo

◦ Evaluamos el rendimiento de OQA con (EAT, EPT, ES, ET)

◦ Proponemos un método para atacar los problemas (SRL, topic-sentiment retrieval, paraphrasing)

◦ Medimos el impacto de incluir recursos adicionales

◦ Las mejorías obtenidas son estadísticamente relevantes

Page 25: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

25

EmotiBlog on OMy OM in real time

Balahur, A., Boldrini, E., Montoyo, A., Martínez-Barco 2009. P. Cross-topic Opinion Mining for Real-time Human-Computer Interaction. ICEIS 2009.

Balahur, A., Boldrini, E., Montoyo, A., Martínez-Barco 2009. Fact versus Opinion Questions Classification and Answering: Challenges and Keys . ICAI 2009

Page 26: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

26

EmotiBlogfeature selection experiments

Evaluamos la utilidad de las características anotadas con combinaciones y usando técnicas de selección de características.

Encontramos problemas como el ruido y el pequeño tamaño del corpus, la granularidad e la anotación y el español (con menos recursos que el inglés).

Page 27: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

27

EmotiBlogfeature selection experiments

Ester Boldrini, Javi Fernández, José M. Gómez and Patricio Martínez-Barco. Machine Learning Techniques for Automatic Opinion Detection in Non-Traditional Textual Genres. WOMSA 2010.

Page 28: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

28

EmotiBlog en resúmenes automáticos

Método para resumir textos subjetivos basado en la intensidad de la opinión expresada.

Promedio de 79% de los resúmenes es comprensible.

Page 29: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

29

EmotiBlog en resúmenes automáticos

Balahur, A., Lloret, E., Boldrini, E., Montoyo, A., Palomar, M., Martínez-Barco, P. 2009. Summarizing Threads in Blogs Using Opinion Polarity. In proceedings of Emerging Text Types Workshop. RANLP 2009.

Page 30: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

30

EmotiBlogen competiciones

Balahur, A., Boldrini, E., Montoyo, A., Martínez-Barco, P. 2010. The OpAL System at NTCIR 8 MOAT. NTICR 8 MOAT.

Page 31: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

31

EmotiBlogaplicado a business

Balahur, A., Boldrini, E., Montoyo, A., Martínez-Barco. OpAL: a System for Mining Opinion from Text for Business Applications. To appear in Business Intelligence Applications and the Web: Models, Systems and Technologies

Page 32: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

32

Qué estamos haciendo

EmotiBlog para el análisis de emociones en eventos (con CNR, Pisa)

EmotiBlog para análisis de productos (Javi, JM)

EmotiBlog para prever los movimientos de acciones (Alex)

Page 33: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

33

Eventos

WASSA 2010 at ECAIWASSA 2011 at ACL

Propuesta para tarea de evaluación: IBEREVAL

Page 34: EmotiBlog: a finer-grained and more precise learning of subjectivity expression models (Publicado en: Linguistic Annotation Workshop –ACL- 2010) Ester

34

Más publicacionesBoldrini, E., Puchol-Blasco, M., Navarro, B., Martínez-

Barco, P., Vargas-Sierra, C. 2008. AQA: a multilingual Anaphora annotation scheme for Question Answering. SEPLN Nº 40.

Boldrini, E., Ferrández, S., Izquierdo, R., Tomás, D.,Vicedo, J.L. 2009. A Parallel Corpus Labelled Using open and Restricted Domain Ontologies. CICLING 2009.

Boldrini, E., Ferrández, S., Izquierdo, R., Ferrández, O., Tomás, D.,Vicedo, J.L. 2009. A proposal of Expected Answer Type and Named Entity annotation in a Question Answering context. Human System Interaction 2009.