lexical acquisition for opinion inference: a sense … · independently annotated a sample of...

31
LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE-LEVEL LEXICON OF BENEFACTIVE AND MALAFACTIVE EVENTS Yoonjung Choi, Lingjia Deng, and Janyce Wiebe University of Pittsburgh

Upload: others

Post on 13-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE-LEVEL LEXICON OF BENEFACTIVE AND MALAFACTIVE EVENTS

Yoonjung Choi, Lingjia Deng, and Janyce Wiebe

University of Pittsburgh

Page 2: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Outline 1.  Introduction 2.  Sense-level GFBF Ambiguity 3.  Lexicon Acquisition 4.  Evaluations

1.  Corpus Evaluation 2.  Sense Annotation Evaluation

5.  Related Work 6.  Summary

Page 3: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Introduction •  Implicit opinions

•  Deng and Wiebe (2014) showed that sentiments toward one entity may be propagated to other entities via opinion implicature rules

The writer expresses an explicit negative sentiment (skyrocketing)

bad for the object, costs

The bill would lower the skyrocketing health care costs

The writer is positive toward the bill

Page 4: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Introduction •  These implicature rules involve events that positively or

negatively affect the object. •  benefactive and malefactive events for ease, goodFor (gf) and badFor (bf) events

•  Verb Classes for gfbf (by Anand and Reschke, 2010) •  Creation/Destruction (changes in states involving existence)

•  e.g., bake a cake à good for the cake •  e.g., destroy the building à bad for the building

•  Gain/Loss (changes in states involving possession) •  e.g., increase the tax rate --> good for the tax rate •  e.g., decrease the tax rate à bad for the tax rate

•  Benefit/Injury (changes in states involving affectedness) •  e.g., comfort the child à good for the child •  e.g., kill Bill à bad for the Bill

Page 5: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Introduction • Since a single word has one or more meaning, it may

have both gf and bf meanings. •  E.g., purge

•  S: (v) purge (oust politically) ``Deng Xiao Ping was purged several times throughout his lifetime'' à bf

•  S: (v) purge (clear of a charge) à gf •  S: (v) purify, purge, sanctify (make pure or free from sin or guilt) ``he left

the monastery purified'' à gf •  S: (v) purge (rid of impurities) ``purge the water''; ``purge your mind'' à

gf

è We take a sense-level approach to acquire gfbf lexicon.

Page 6: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Outline 1.  Introduction 2.  Sense-level GFBF Ambiguity 3.  Lexicon Acquisition 4.  Evaluations

1.  Corpus Evaluation 2.  Sense Annotation Evaluation

5.  Related Work 6.  Summary

Page 7: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Sense-level GFBF Ambiguity • Since words often have more than one senses, the

polarity of a word may or may not be consistent.

• Case1: The polarity of a word is consistent •  encourage à a word with only gf senses

•  S1: (v) promote, advance, boost, further, encourage (contribute to the progress or growth of)

•  S2: (v) encourage (inspire with confidence; give hope or courage to) •  S3: (v) encourage (spur on)

•  In this case, word-level approaches can work well.

Page 8: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Sense-level GFBF Ambiguity

• Case2: A word with gf (bf) and neutral senses •  inspire

•  S3: (v) prompt, inspire, instigate (serve as the inciting cause of) à gf

•  S4: (v) cheer, root on, inspire, urge, barrack, urge on, exhort, pep up (spur on or encourage especially by cheers and shouts) à gf

•  S6: (v) inhale, inspire, breathe in (draw in (air)) à neutral

•  neutralize •  S2: (v) neutralize, neutralise, nullify, negate (make ineffective by

counterbalancing the effect of) à bf •  S6: (v) neutralize, neutralise (make chemically neutral) à neutral

Page 9: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Sense-level GFBF Ambiguity

• Case3: A word with gf and bf senses •  fight

•  S2: (v) fight, oppose, fight back, fight down, defend (fight against or resist strongly) ex) we need to fight this repeal à fight is bad for the object, this repeal

•  S4: (v) crusade, fight, press, campaign, push, agitate (exert oneself continuously, vigorously, or obtrusively to gain an end or engage in a crusade for a certain cause or person; be an advocate for) ex) fight for a piece of legislation à fight is good for the object, a piece of legislation

Page 10: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Outline 1.  Introduction 2.  Sense-level GFBF Ambiguity 3.  Lexicon Acquisition 4.  Evaluations

1.  Corpus Evaluation 2.  Sense Annotation Evaluation

5.  Related Work 6.  Summary

Page 11: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Lexicon Acquisition • We develop a sense-level gfbf lexicon by exploiting

WordNet.

• Seed Lexicon •  An annotator selected gfbf words from FrameNet. •  592 gf words and 523 bf words are found. •  Decomposing each word into its senses, there are 1,525 gf senses

and 1,154 bf senses. •  We randomly choose 200 gf senses and 200 bf senses.

Page 12: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Lexicon Acquisition • Resource

•  WordNet Relations (http://wordnet.princeton.edu/) •  Hypernym relations: more general senses

•  We hypothesize that direct hypernyms tend to have the same or neutral polarity, but not the opposite polarity.

•  Troponym relations: more specific verb senses •  We hypothesize that troponyms of a sense tends to have its same polarity.

•  Verb groups: similar meanings are manually grouped together

•  WordNet Similarity (http://wn-similarity.sourceforge.net/) •  It provides a variety of relatedness measures based on information

found in the WordNet. •  We choose Jiang&Conrath (jcn) measure method.

Page 13: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Lexicon Acquisition • Expansion method

1.  Create gfLexicon = {gf seed set} and bfLexicon = {bf seed set} 2.  For each sense in gfLexicon, extract direct troponyms, direct

hypernyms, and members of the same verb of each sense and add all senses with above-threshold jcn values (newgfLexicon)

3.  For each sense in bfLexicon , extract direct troponyms, direct hypernyms, and members of the same verb of each sense and add all senses with above-threshold jcn values (newbfLexicon)

4.  Remove conflicted senses 5.  Add remaining senses in newgfLexicon and newbfLeixcon into

gfLexicon and bfLexicon 6.  Repeat 2-6

Page 14: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Outline 1.  Introduction 2.  Sense-level GFBF Ambiguity 3.  Lexicon Acquisition 4.  Evaluations

1.  Corpus Evaluation 2.  Sense Annotation Evaluation

5.  Related Work 6.  Summary

Page 15: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Corpus Evaluation • GFBF Corpus (http://mpqa.cs.pitt.edu/)

•  Manually annotated with gfbf information by Deng et al. (2013) •  134 blog posts and editorials about the Affordable Care Act •  1,411 annotated gfbf instances

•  <its agent, gfbf event, its object> •  Distribution of gfbf words

•  196 different words in gf instances •  286 different words in bf instances •  10 words in both instances (e.g., fight)

• Gold Standard •  All senses of gf (bf) words are considered to be gf (bf) senses. à  The gold standard set contains 772 gf senses and 1,029 bf

senses.

Page 16: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Corpus Evaluation • Evaluation Metric

•  gfOverlap: the overlap between the senses in the expanded lexicon and the gold-standard gf set

•  bfOverlap: the overlap between the senses in the expanded lexicon and the gold-standard bf set

•  Accuracy

•  Goodfor: #gfOverlap / (#gfOverlap + #bfOverlap) •  Badfor: #bfOverlap / (#gfOverlap + #bfOverlap)

Expanded gf (bf) senses

gold-standard bf

senses

gold-standard gf senses

gfOverlap bfOverlap

Page 17: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Corpus Evaluation • Results after Lexicon Expansion

•  4,157 new gf senses and 5,071 new bf senses are extracted. •  Overall, accuracy is higher for the bf than the gf lexicon. •  Even though the seed set is completely independent from the

corpus, the expanded lexicon’s coverage of the corpus is not small.

#senses #gfOverlap #bfOverlap Accuracy goodFor 4,157 449 176 0.72 badFor 5,071 105 562 0.84

Page 18: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Corpus Evaluation • Results after lexicon expansion (cont’)

•  WordNet Similarity is advantageous because it detects similar senses automatically.

goodFor # senses #gfOverlap #bfOverlap Accuracy

WN Sim 1,073 134 75 0.64 Groups 242 69 24 0.74

Troponym 4,084 226 184 0.55

Hypernym 223 75 33 0.69

badFor # senses #gfOverlap #bfOverlap Accuracy

WN Sim 1,008 34 190 0.85 Groups 255 11 86 0.89

Troponym 4,258 66 375 0.85

Hypernym 286 16 77 0.83

Page 19: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Corpus Evaluation • Results after lexicon expansion (cont’)

•  Verb group is the most informative relation even though the coverage is not big.

goodFor # senses #gfOverlap #bfOverlap Accuracy

WN Sim 1,073 134 75 0.64

Groups 242 69 24 0.74 Troponym 4,084 226 184 0.55

Hypernym 223 75 33 0.69

badFor # senses #gfOverlap #bfOverlap Accuracy

WN Sim 1,008 34 190 0.85

Groups 255 11 86 0.89 Troponym 4,258 66 375 0.85

Hypernym 286 16 77 0.83

Page 20: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Corpus Evaluation • Results after lexicon expansion (cont’)

•  The troponym relation yields the most number of senses.

goodFor # senses #gfOverlap #bfOverlap Accuracy

WN Sim 1,073 134 75 0.64

Groups 242 69 24 0.74

Troponym 4,084 226 184 0.55 Hypernym 223 75 33 0.69

badFor # senses #gfOverlap #bfOverlap Accuracy

WN Sim 1,008 34 190 0.85

Groups 255 11 86 0.89

Troponym 4,258 66 375 0.85 Hypernym 286 16 77 0.83

Page 21: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Corpus Evaluation • Results after lexicon expansion (cont’)

•  For the hypernym relation, the number of detected senses is not large because many were already detected in previous iteration.

goodFor # senses #gfOverlap #bfOverlap Accuracy

WN Sim 1,073 134 75 0.64

Groups 242 69 24 0.74

Troponym 4,084 226 184 0.55

Hypernym 223 75 33 0.69

badFor # senses #gfOverlap #bfOverlap Accuracy

WN Sim 1,008 34 190 0.85

Groups 255 11 86 0.89

Troponym 4,258 66 375 0.85

Hypernym 286 16 77 0.83

Page 22: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Outline 1.  Introduction 2.  Sense-level GFBF Ambiguity 3.  Lexicon Acquisition 4.  Evaluations

1.  Corpus Evaluation 2.  Sense Annotation Evaluation

5.  Related Work 6.  Summary

Page 23: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Sense Annotation Evaluation • Sense Annotation

•  For a more direct evaluation, two annotators (co-authors) independently annotated a sample of senses.

•  We randomly selected 60 words among the following classes: •  10 pure gf words (i.e., all senses of the words are classified by the

expansion method, and all senses are put into the gf lexicon.) •  10 pure bf words •  20 mixed words (i.e., some senses are put into the gf lexicon while

others are put into the bf lexicon.) •  20 incomplete words (i.e., some senses of the words are not classified

by the expansion method.)

Page 24: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Sense Annotation Evaluation • Evaluation

•  Baseline: the majority class •  Evaluation Matrix

•  Accuracy: the percentage of correctly classified senses from the expansion method based on each annotator

•  Incorrect-opposite: senses are classified as the opposite polarity. •  Incorrect-neutral: the expansion method classifies as gf or bf, but the

annotator marked it as neutral.

baseline accuracy (% correct)

% incorrect-opposite

% incorrect-neutral

Anno1 0.37 0.53 0.16 0.32 Anno2 0.44 0.57 0.24 0.19

Page 25: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Sense Annotation Evaluation • Evaluation (cont’)

•  The results for gfbf classes •  gf (bf) accuracy: the percentage of correct gf (Bf) senses out of all

senses annotated as gf (bf) accoding to the annotations

•  The agreement between the annotators •  Percent agreement: 0.84 •  Kappa (Artstein and Poesio, 2008): 0.75

gf accuracy bf accuracy baseline Anno1 0.74 0.83 0.37 Anno2 0.68 0.74 0.44

Page 26: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Outline 1.  Introduction 2.  Sense-level GFBF Ambiguity 3.  Lexicon Acquisition 4.  Evaluations

1.  Corpus Evaluation 2.  Sense Annotation Evaluation

5.  Related Work 6.  Summary

Page 27: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Related Work •  Few works to closest to ours

•  Goyal et al. (2010) generated a lexicon of patient polarity verbs that imparts positive or negative states on their patients.

•  Feng et al. (2011) built connotation lexicons that list words with connotative polarity and connotative predicates.

•  Riloff et al. (2013) learn a lexicon of negative situation phrases from a corpus of tweets

è However, these are word-level lexicon

• Sense-level lexicon for polarity or subjectivity •  Esuli and Sebastiani (2006) constructed SentiWordNet. •  Gyamfi et al. (2009) constructed a classifier to label the subjectivity

of word senses. •  Su and Markert (2009) adopt a semi-supervised mincut method to

recognize the subjectivity of word senses.

Page 28: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Outline 1.  Introduction 2.  Sense-level GFBF Ambiguity 3.  Lexicon Acquisition 4.  Evaluations

1.  Corpus Evaluation 2.  Sense Annotation Evaluation

5.  Related Work 6.  Summary

Page 29: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Summary • We developed a sense-level gfbf lexicon.

• Our evaluations show that lexical resources are promising for expanding such sense-level lexicons.

• Even though the seed set is completely independent from the corpus, the expanded lexicon’s coverage of the corpus is not small.

Page 30: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

THANK YOU! J

Page 31: LEXICAL ACQUISITION FOR OPINION INFERENCE: A SENSE … · independently annotated a sample of senses. • We randomly selected 60 words among the following classes: • 10 pure gf

Reference •  Lingjia Deng and Janyce Wiebe. 2014. Sentiment propagation via implicature constraints. In

Proc. of EACL •  Lingjia Deng, Yoonjung Choi, and Janyce Wiebe. 2013. Benefactive/malefactive event and

writer attitude annotation. In Proc. of 51st ACL •  Georage A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller.

1990. WordNet: An on-line lexical database. International Journal of Lexicography, 13(4):235-312

•  Jay J. Jiang and David W. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proc. of COLING

•  Rond Artstein and Massimo Poesio. 2008. Inter-corder agreement for computational linguistics. Comput. Linguist., 34(4):555-596

•  Amit Goyal, Ellen Riloff, and Hal DaumeIII. 2010. Automatically producing plot unit representations for narrative text. In Proc. of EMNLP, pages 77-86

•  Song Feng, Ritwik Bose, and Yejin Choi. 2011. Learning general connotation of words using graph-based algorithms. In Proc. of EMNLP, pages 1092-1103

•  Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra De Silva, Nathan Gilbert, and Ruihong Huang. 2013. Sarcasm as contrast between a positive sentiment and negative situation. In Proc. of EMNLP, pages 704-714

•  Andrea Esuli and Fabrizio Sebastiani. 2006. SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining. In Proc. of 5th LREC, pages 417-422

•  Yaw Gyamfi, Janyce Wiebe, Rada Mihalcea, and Cem Akkaya. 2009. Integrating knowledge for subjectivity sense labeling. In Proc. of NAACL HLT, pages 10-18

•  Fangzhong Su and Katja Markert. 2009. Subjectivity Recognition on Word Senses via Semi-supervised Mincuts. In Proc. of NAACL HLT, pages 1-9

•  FrameNet, https://framenet2.icsi.berkeley.edu/