learning subjective nouns using extraction pattern bootstrapping ellen riloff, janyce wiebe, theresa...
Post on 21-Dec-2015
218 views
TRANSCRIPT
Learning Subjective Nouns using Extraction Pattern Bootstrapping
Ellen Riloff, Janyce Wiebe, Theresa Wilson
Presenter: Gabriel Nicolae
Subjectivity – the Annotation Scheme
http://www.cs.pitt.edu/~wiebe/pubs/ardasummer02/
Goal: to identify and characterize expressions of private states in a sentence.
Private state = opinions, evaluations, emotions and speculations.
Also judge the strength of each private state: low, medium, high, extreme.
Annotation gold standard: a sentence is subjective if it contains at least one private-state expression of medium
or higher strength objective – all the rest
The time has come, gentlemen, for Sharon, the assassin, to realize that injustice cannot last long.
Using Extraction Patterns to Learn Subjective Nouns – Meta-Bootstrapping (1/2) (Riloff and Jones 1999) Mutual bootstrapping:
Begin with a small set of seed words that represent a targeted semantic category
(e.g. begin with 10 words that represent LOCATIONS) and an unannotated corpus. Produce thousands of extraction patterns for the entire
corpus (e.g. “<subject> was hired”) Compute a score for each pattern based on the
number of seed words among its extractions Select the best pattern, all of its extracted noun
phrases are labeled as the target semantic category Re-score extraction patterns (original seed words +
newly labeled words)
Using Extraction Patterns to Learn Subjective Nouns – Meta-Bootstrapping (2/2)
Meta-bootstrapping: After the normal bootstrapping
all nouns that were put into the semantic dictionary are reevaluated
each noun is assigned a score based on how many different patterns extracted it.
only the 5 best nouns are allowed to remain in the dictionary; the others are discarded
restart mutual bootstrapping
Using Extraction Patterns to Learn Subjective Nouns – Basilisk (Thelen and Riloff 2002) Begin with
an unannotated text corpus and a small set of seed words for a semantic category
Bootstrapping: Basilisk automatically generates a set of extraction patterns
for the corpus and scores each pattern based upon the number of seed words among its extractions best patterns in the Pattern Pool.
All nouns extracted by a pattern in the Pattern Pool Candidate Word Pool. Basilisk scores each noun based upon the set of patterns that extracted it and their collective association with the seed words.
The top 10 nouns are labeled as the targeted semantic class and are added to the dictionary.
Repeat bootstrapping process.
Using Extraction Patterns to Learn Subjective Nouns – Experimental Results
The graph tracks the accuracy as bootstrapping progressed.
Accuracy was high during the initial iterations but tapered off as the bootstrapping continued.
After 20 words, both algorithms were 95% accurate. After 100 words, Basilisk was 75% accurate and MetaBoot 81%. After 1000 words, MetaBoot 28% and Basilisk 53%.
Creating Subjectivity Classifiers – Subjective Noun Features Naïve Bayes classifier using the nouns as features.Sets:
BA-Strong: the set of StrongSubjective nouns generated by Basilisk
BA-Weak: the set of WeakSubjective nouns generated by Basilisk
MB-Strong: the set of StrongSubjective nouns generated by Meta-Bootstrapping
MB-Weak: the set of WeakSubjective nouns generated by Meta-Bootstrapping
For each set – a three-valued feature: presence of 0, 1, ≥2 words from that set
Creating Subjectivity Classifiers – Previously Established Features (Wiebe, Bruce, O’Hara 1999) Sets:
a set of stems positively correlated with the subjective training examples – subjStems
a set of stems positively correlated with the objective training examples – objStems
For each set – a three-valued feature the presence of 0, 1, ≥2 members of the set.
A binary feature for each: presence in the sentence of a pronoun, adjective,
cardinal number, modal other than will, adverb other than not.
Other features from other researchers.
Creating Subjectivity Classifiers – Discourse Features
||
||)(
S
SinsubjCluesSClueRatesubj
subjClues = all sets defined before except objStems
||
||)(
S
SinobjStemsSClueRateobj
Four features: ClueRatesubj for the previous and following sentences ClueRateobj for the previous and following sentences
Feature for sentence length.
Creating Subjectivity Classifiers –Classification Results
The results of Naïve Bayes classifiers trained with different combinations of features.
Using both WBO and SubjNoun achieves better performance than either one alone.
The best results are achieved with all the features combined.
Another classification, with a higher precision, can be obtained by classifying a sentence as subjective if it contains any of the StrongSubjective nouns. 87% precision 26% recall