a syntactic constituent ranker for speculation and …...2011/09/21 · university of oslo:...
TRANSCRIPT
University of Oslo : Department of Informatics
A Syntactic Constituent Ranker forSpeculation and Negation Scope Resolution
Erik Velldal Lilja ØvrelidJonathon Read† Stephan Oepen
Date: 21 September 2011 Venue: Master Seminar in Language TechnologyDepartment of InformaticsUniversity of Oslo
Introduction
TaskGiven an appropriate cue of speculation or negation, determineits scope—the subsequence of the sentence that is affected.
For example:I {The unknown amino acid
⟨may⟩
be used by these species}.I Samples of the protein pair space were taken {〈instead of〉
considering the whole space} [...]
Some applications:I Information ExtractionI Sentiment Analysis
Introduction
Our approach
Generate candidate scopes from constituents in syntactic trees;use a supervised learner to rank these candidates
unlabeledsentence
syntacticconstituent
parsing
data-driven
ranking
speculation/ negation
cue
predictedscope
Outline
1 Data sets and evaluation measures
2 A brief review of speculation and negation at LNS
3 A syntactic constituent ranker
4 Experiments
Outline
1 Data sets and evaluation measures
2 A brief review of speculation and negation at LNS
3 A syntactic constituent ranker
4 Experiments
Data Sets
BioScope Corpus (Vincze et al., 2008)I 20,924 sentences annotated with speculation/negation cues
and associated scopes.I Sentences are drawn from biomedical abstracts, full papers
and clinical reports.
CoNLL-2010 Shared Task Evaluation Data (Farkas et al., 2010)I A further set of 5,003 sentences from biomedical papers,
annotated for speculation.
Data Sets
Data
Total Speculation NegationSentences Sentences Cues Sentences Cues
Abstracts 11,871 2,101 2,659 1,597 1,719Papers 2,670 519 668 339 376CoNLL 5,003 790 1,033 – –
Note on data set splits for development and evaluation:Speculation: all of the Abstracts and Papers are used for
development; the CoNLL set is held-out.Negation: 90% of the Abstracts and Papers are used for
development; the remainder is held-out.
Preprocessing
I BioScope XML converted to stand-off characterisation.I Tokenisation performed using the GENIA tagger
(Tsuruoka et al., 2005), supplemented with a finite-statetokeniser
I Part-of-speech tags from both the GENIA tagger (for higheraccuracy in the biomedical domain) and TnT (Brants, 2000)
I Dependency parsing using MaltParser (Nivre et al., 2006)stacked with the XLE platform (Crouch et al., 2008) withthe English grammar developed by Butt et al., (2002).
I Constituent tree parsing with PET (Callmeier, 2002) andthe English Resource Grammar (Flickinger 2002).
Preprocessing
I BioScope XML converted to stand-off characterisation.I Tokenisation performed using the GENIA tagger
(Tsuruoka et al., 2005), supplemented with a finite-statetokeniser
I Part-of-speech tags from both the GENIA tagger (for higheraccuracy in the biomedical domain) and TnT (Brants, 2000)
I Dependency parsing using MaltParser (Nivre et al., 2006)stacked with the XLE platform (Crouch et al., 2008) withthe English grammar developed by Butt et al., (2002).
I Constituent tree parsing with PET (Callmeier, 2002) andthe English Resource Grammar (Flickinger 2002).
Preprocessing
I BioScope XML converted to stand-off characterisation.I Tokenisation performed using the GENIA tagger
(Tsuruoka et al., 2005), supplemented with a finite-statetokeniser
I Part-of-speech tags from both the GENIA tagger (for higheraccuracy in the biomedical domain) and TnT (Brants, 2000)
I Dependency parsing using MaltParser (Nivre et al., 2006)stacked with the XLE platform (Crouch et al., 2008) withthe English grammar developed by Butt et al., (2002).
I Constituent tree parsing with PET (Callmeier, 2002) andthe English Resource Grammar (Flickinger 2002).
Preprocessing
I BioScope XML converted to stand-off characterisation.I Tokenisation performed using the GENIA tagger
(Tsuruoka et al., 2005), supplemented with a finite-statetokeniser
I Part-of-speech tags from both the GENIA tagger (for higheraccuracy in the biomedical domain) and TnT (Brants, 2000)
I Dependency parsing using MaltParser (Nivre et al., 2006)stacked with the XLE platform (Crouch et al., 2008) withthe English grammar developed by Butt et al., (2002).
I Constituent tree parsing with PET (Callmeier, 2002) andthe English Resource Grammar (Flickinger 2002).
Preprocessing
I BioScope XML converted to stand-off characterisation.I Tokenisation performed using the GENIA tagger
(Tsuruoka et al., 2005), supplemented with a finite-statetokeniser
I Part-of-speech tags from both the GENIA tagger (for higheraccuracy in the biomedical domain) and TnT (Brants, 2000)
I Dependency parsing using MaltParser (Nivre et al., 2006)stacked with the XLE platform (Crouch et al., 2008) withthe English grammar developed by Butt et al., (2002).
I Constituent tree parsing with PET (Callmeier, 2002) andthe English Resource Grammar (Flickinger 2002).
Evaluation measures
Evaluation was performed using the scoring software of theCoNLL 2010 Shared Task.
Prec =tp
tp+fp Rec =tp
tp+fn F1 = 2×Prec×RecPrec+Rec
A true positive requires identification of all words in scope.
Outline
1 Data sets and evaluation measures
2 A brief review of speculation and negation at LNS
3 A syntactic constituent ranker
4 Experiments
Supervised classification of cues
We employ a binary word-by-word linear support vectormachine classifier using the SVMlight toolkit.
A simpler approach is to treat the set of cues as near-closedclass, applying filtering to disregard all known non-cues beforetraining and applying the classifier.
Cues described using features of:I n-grams of lemmas up to three positions to the left; andI n-grams of surface forms up to two positions to the right.
Supervised classification of cues
We employ a binary word-by-word linear support vectormachine classifier using the SVMlight toolkit.
A simpler approach is to treat the set of cues as near-closedclass, applying filtering to disregard all known non-cues beforetraining and applying the classifier.
Cues described using features of:I n-grams of lemmas up to three positions to the left; andI n-grams of surface forms up to two positions to the right.
Supervised classification of cues
We employ a binary word-by-word linear support vectormachine classifier using the SVMlight toolkit.
A simpler approach is to treat the set of cues as near-closedclass, applying filtering to disregard all known non-cues beforetraining and applying the classifier.
Cues described using features of:I n-grams of lemmas up to three positions to the left; andI n-grams of surface forms up to two positions to the right.
Supervised classification of cues
Multi-word cues are handled in post-processing by matchingpatterns observed in the development data lemmas.
Speculationcannot {be}? excludeeither .+ orindicate thatmay,? or may notno {evidence | proof | guarantee}not {known | clear | evident | understood | exclude}raise the .* {possibility | question | issue | hypothesis}whether or not
Negationrather than{can|could} notno longerinstead ofwith the * exception ofneither * nor{no(t?)|neither} * nor
Supervised classification of cues
Development Results: Speculation
Prec Rec F1
Baseline 90.49 81.16 85.57Word-by-word 94.65 82.26 88.02Filtering 94.13 84.60 89.11
Development Results: Negation
Prec Rec F1
Word-by-word 87.69 97.61 92.38Filtering 92.49 97.72 95.03
Supervised classification of cues
Development Results: Speculation
Prec Rec F1
Baseline 90.49 81.16 85.57Word-by-word 94.65 82.26 88.02Filtering 94.13 84.60 89.11
Development Results: Negation
Prec Rec F1
Word-by-word 87.69 97.61 92.38Filtering 92.49 97.72 95.03
Hueristics for scope resolution
Hueristics, activated by the cue’s part-of-speech operate overdependency structures:
I Coordinations scope over their conjuncts;
I Prepositions scope over their argument and itsdescendants;
I Attributive adjectives scope over their nominal head andits descendants;
I Predicative adjectives scope over referential subjects andclausal arguments, if present;
I Modals inherit subject scope from their lexical verb andscope over their descendants; and
I Passive or raising verbs scope over referential subjects andthe verbal descendants.
Hueristics for scope resolution
Hueristics, activated by the cue’s part-of-speech operate overdependency structures:
I Coordinations scope over their conjuncts;
I Prepositions scope over their argument and itsdescendants;
I Attributive adjectives scope over their nominal head andits descendants;
I Predicative adjectives scope over referential subjects andclausal arguments, if present;
I Modals inherit subject scope from their lexical verb andscope over their descendants; and
I Passive or raising verbs scope over referential subjects andthe verbal descendants.
Hueristics for scope resolution
Hueristics, activated by the cue’s part-of-speech operate overdependency structures:
I Coordinations scope over their conjuncts;
I Prepositions scope over their argument and itsdescendants;
I Attributive adjectives scope over their nominal head andits descendants;
I Predicative adjectives scope over referential subjects andclausal arguments, if present;
I Modals inherit subject scope from their lexical verb andscope over their descendants; and
I Passive or raising verbs scope over referential subjects andthe verbal descendants.
Hueristics for scope resolution
Hueristics, activated by the cue’s part-of-speech operate overdependency structures:
I Coordinations scope over their conjuncts;
I Prepositions scope over their argument and itsdescendants;
I Attributive adjectives scope over their nominal head andits descendants;
I Predicative adjectives scope over referential subjects andclausal arguments, if present;
I Modals inherit subject scope from their lexical verb andscope over their descendants; and
I Passive or raising verbs scope over referential subjects andthe verbal descendants.
Hueristics for scope resolution
Hueristics, activated by the cue’s part-of-speech operate overdependency structures:
I Coordinations scope over their conjuncts;
I Prepositions scope over their argument and itsdescendants;
I Attributive adjectives scope over their nominal head andits descendants;
I Predicative adjectives scope over referential subjects andclausal arguments, if present;
I Modals inherit subject scope from their lexical verb andscope over their descendants; and
I Passive or raising verbs scope over referential subjects andthe verbal descendants.
Hueristics for scope resolution
Hueristics, activated by the cue’s part-of-speech operate overdependency structures:
I Coordinations scope over their conjuncts;
I Prepositions scope over their argument and itsdescendants;
I Attributive adjectives scope over their nominal head andits descendants;
I Predicative adjectives scope over referential subjects andclausal arguments, if present;
I Modals inherit subject scope from their lexical verb andscope over their descendants; and
I Passive or raising verbs scope over referential subjects andthe verbal descendants.
Hueristics for scope resolution
Hueristics, activated by the cue’s part-of-speech operate overdependency structures:
I Coordinations scope over their conjuncts;
I Prepositions scope over their argument and itsdescendants;
I Attributive adjectives scope over their nominal head andits descendants;
I Predicative adjectives scope over referential subjects andclausal arguments, if present;
I Modals inherit subject scope from their lexical verb andscope over their descendants; and
I Passive or raising verbs scope over referential subjects andthe verbal descendants.
Hueristics for scope resolution
In the case of negation an additional set of rules are applied:
I Determiners scope over their head node and itsdescendants;
I The noun none scopes over the entire sentence if it is thesubject, otherwise it scopes over its descendants;
I Other nouns or verbs scope over their descendants;I Adverbs with a verbal head scope over the descendants of
the lexical verb; and
I Other adverbs scope over the descendants of the head.
If none of these rules activate then a default scope is applied bylabeling from the cue to the sentence final punctuation.
Hueristics for scope resolution
In the case of negation an additional set of rules are applied:
I Determiners scope over their head node and itsdescendants;
I The noun none scopes over the entire sentence if it is thesubject, otherwise it scopes over its descendants;
I Other nouns or verbs scope over their descendants;I Adverbs with a verbal head scope over the descendants of
the lexical verb; and
I Other adverbs scope over the descendants of the head.
If none of these rules activate then a default scope is applied bylabeling from the cue to the sentence final punctuation.
Hueristics for scope resolution
In the case of negation an additional set of rules are applied:
I Determiners scope over their head node and itsdescendants;
I The noun none scopes over the entire sentence if it is thesubject, otherwise it scopes over its descendants;
I Other nouns or verbs scope over their descendants;I Adverbs with a verbal head scope over the descendants of
the lexical verb; and
I Other adverbs scope over the descendants of the head.
If none of these rules activate then a default scope is applied bylabeling from the cue to the sentence final punctuation.
Hueristics for scope resolution
In the case of negation an additional set of rules are applied:
I Determiners scope over their head node and itsdescendants;
I The noun none scopes over the entire sentence if it is thesubject, otherwise it scopes over its descendants;
I Other nouns or verbs scope over their descendants;I Adverbs with a verbal head scope over the descendants of
the lexical verb; and
I Other adverbs scope over the descendants of the head.
If none of these rules activate then a default scope is applied bylabeling from the cue to the sentence final punctuation.
Hueristics for scope resolution
In the case of negation an additional set of rules are applied:
I Determiners scope over their head node and itsdescendants;
I The noun none scopes over the entire sentence if it is thesubject, otherwise it scopes over its descendants;
I Other nouns or verbs scope over their descendants;I Adverbs with a verbal head scope over the descendants of
the lexical verb; and
I Other adverbs scope over the descendants of the head.
If none of these rules activate then a default scope is applied bylabeling from the cue to the sentence final punctuation.
Hueristics for scope resolution
In the case of negation an additional set of rules are applied:
I Determiners scope over their head node and itsdescendants;
I The noun none scopes over the entire sentence if it is thesubject, otherwise it scopes over its descendants;
I Other nouns or verbs scope over their descendants;I Adverbs with a verbal head scope over the descendants of
the lexical verb; and
I Other adverbs scope over the descendants of the head.
If none of these rules activate then a default scope is applied bylabeling from the cue to the sentence final punctuation.
Hueristics for scope resolution
In the case of negation an additional set of rules are applied:
I Determiners scope over their head node and itsdescendants;
I The noun none scopes over the entire sentence if it is thesubject, otherwise it scopes over its descendants;
I Other nouns or verbs scope over their descendants;I Adverbs with a verbal head scope over the descendants of
the lexical verb; and
I Other adverbs scope over the descendants of the head.
If none of these rules activate then a default scope is applied bylabeling from the cue to the sentence final punctuation.
Hueristics for scope resolution
Development Results: Speculation Scope for Gold Cues
F1
Abstracts Default Baseline 69.84Dependency Rules 73.67
Papers Default Baseline 45.21Dependency Rules 72.31
Development Results: Negation Scope for Gold Cues
F1
Abstracts Default Baseline 51.75Dependency Rules 70.76
Papers Default Baseline 31.38Dependency Rules 67.16
Hueristics for scope resolution
Development Results: Speculation Scope for Gold Cues
F1
Abstracts Default Baseline 69.84Dependency Rules 73.67
Papers Default Baseline 45.21Dependency Rules 72.31
Development Results: Negation Scope for Gold Cues
F1
Abstracts Default Baseline 51.75Dependency Rules 70.76
Papers Default Baseline 31.38Dependency Rules 67.16
Outline
1 Data sets and evaluation measures
2 A brief review of speculation and negation at LNS
3 A syntactic constituent ranker
4 Experiments
Selecting candidate constituents
Learn a ranking function over candidate constituents within aparse.
sb-hd mc c
sp-hd n c
d - the le
the
aj-hdn norm c
aj - i le
unknown
n ms-cnt ilr
n - mc le
amino acid
hd-cmp u c
v vp mdl-p le⟨may⟩ hd-cmp u c
v prd be le
be
hd-cmp u c
v pas odlr
v np le
used
hd-cmp u c
p np ptcl le
by
sp-hd n c
d - pl le
these
w period plr
n pl olr
n - c le
species.
Selecting candidate constituents
Candidates are generated by following the path from the cue tothe root of the tree, for example:
I modal verb : FalseThe unknown amino acid {
⟨may⟩} by used by these species.
I head – complement : FalseThe unknown amino acid {
⟨may⟩
be used by these species}.
I subject – head : True{The unknown amino acid
⟨may⟩
be used by these species}.
Selecting candidate constituents
Candidates are generated by following the path from the cue tothe root of the tree, for example:
I modal verb : FalseThe unknown amino acid {
⟨may⟩} by used by these species.
I head – complement : FalseThe unknown amino acid {
⟨may⟩
be used by these species}.
I subject – head : True{The unknown amino acid
⟨may⟩
be used by these species}.
Selecting candidate constituents
Candidates are generated by following the path from the cue tothe root of the tree, for example:
I modal verb : FalseThe unknown amino acid {
⟨may⟩} by used by these species.
I head – complement : FalseThe unknown amino acid {
⟨may⟩
be used by these species}.
I subject – head : True{The unknown amino acid
⟨may⟩
be used by these species}.
Alignment of constituents and scopes
Scope boundaries must align with the boundaries ofconstituents for the approach to be successful.
Alignment can be improved by applying slackening rules,including:I eliminating constituent-final punctuation;
I eliminating constituent-final parenthesised elements;
I reducing scope to the left when the left-most terminal is anadverb and not the cue; and
I ensuring the scope starts with the cue when it is a noun.
Alignment of constituents and scopes
Scope boundaries must align with the boundaries ofconstituents for the approach to be successful.
Alignment can be improved by applying slackening rules,including:I eliminating constituent-final punctuation;
I eliminating constituent-final parenthesised elements;
I reducing scope to the left when the left-most terminal is anadverb and not the cue; and
I ensuring the scope starts with the cue when it is a noun.
Alignment of constituents and scopes
80
85
90
95
1 10 20 30 40 50
% o
f h
edg
e sc
op
es a
lig
ned
wit
h c
on
stit
uen
ts
n-best parsing results
BSA
BSP
Alignment of constituents and scopes
Analysis of the non-aligned items indicated that mismatchesarise from:I parse ranking errors (40%),I non-syntactic scope (25%) e.g.
“This allows us to {⟨address a number of questions
⟩: what proportion
of [...] can be attributed to a known domain-domain interaction}?”,I divergent syntactic theories (16%),I parenthesised elements (13%) andI annotation errors (6%)
Alignment of constituents and scopes
In the papers:I A parse is produced for 86% of sentences.I Alignment of constituents and speculation scopes is 81%
when inspecting the first parse of sentences.I The upper-bound performance of the ranker is around
76% when resolving the scope of speculation.
I A similar analysis indicates an upper-bound of 77% fornegation.
Alignment of constituents and scopes
In the papers:I A parse is produced for 86% of sentences.I Alignment of constituents and speculation scopes is 81%
when inspecting the first parse of sentences.I The upper-bound performance of the ranker is around
76% when resolving the scope of speculation.
I A similar analysis indicates an upper-bound of 77% fornegation.
Training procedure
Given a parsed BioScope sentence:I The candidate consitutuent that corresponds to the scope is
labeled as correct;I all other candidates are labeled as incorrect.I Learn a scoring function (using SVMlight).
Describe candidates with three classes of features that record:I paths in the tree;I surface order; andI rule-like linguistic phenomena.
Training procedure
sb-hd mc c
sp-hd n c
d - the le
the
aj-hdn norm c
aj - i le
unknown
n ms-cnt ilr
n - mc le
amino acid
hd-cmp u c
v vp mdl-p le⟨may⟩ hd-cmp u c
v prd be le
be
hd-cmp u c
v pas odlr
v np le
used
hd-cmp u c
p np ptcl le
by
sp-hd n c
d - pl le
these
w period plr
n pl olr
n - c le
species.
Path features
Paths from speculation cues to candidate constituents:I specific, e.g. ‘‘v vp mdl-p le\hd-cmp u c\sb-hd mc c’’
I general, e.g. ‘‘v vp mdl-p le\\sb-hd mc c’’
With lexicalised variants:I specific, e.g. ‘‘may : v vp mdl-p le\hd-cmp u c\sb-hd mc c’’
I general, e.g. ‘‘may : v vp mdl-p le\\sb-hd mc c’’
Bigrams formed of nodes and their parents, e.g.‘‘v vp mdl-p le hd-cmp u c’’ and‘‘hd-cmp u c sb-hd mc c’’
Surface features
I Bigrams formed of preterminal lexical types, e.g.‘‘d - the le aj-hdn norm c’’, ‘‘aj-hdn norm c n - mc le’’, etc.
I Cue position within candidate (in tertiles) , e.g. ‘‘33%-66%’’.
I Candidate size relative to sentence length (in quartiles),e.g. ‘‘75%-100%’’
I Punctuation preceeding the candidate, e.g. ‘‘false’’
I Punctuation at end of the candidate e.g. ‘‘true’’
Rule-like features
Detection of:
I Passivisation
I Subject control verbs occuring with passivised verbs
I Subject raising verbs
I Predicative adjectives
Detecting control verbs with a passivised verb
1. Cue is a subject control verb.
2. Find the first subject – headparent of (1) on the head path.
3. Find the firsthead – complement child of (2)on the head path.
4. The right-most daughter of (3)or one of its descendents is apassivized verb.
5. The transitive head daughterof the left-most daughter of (2)is not an expletive it or there.
subject
control
verb
subject–
head
not
expletive
pronoun
H
the unknown amino acid
H
(1)
(2)
(5)
*
H
head–
complement
passive
verb
(3)
(4)
<may> be used by these species.
Using predictions from auxiliary systems
Combine with other systems by introducing features recordingmatches with:I the start of the auxiliary prediction;
I the end of the auxiliary prediction; or
I the entirety of the auxiliary prediction.
In cases where an ERG parse is unavailable we back off to theprediction of the dependency rules.
Outline
1 Data sets and evaluation measures
2 A brief review of speculation and negation at LNS
3 A syntactic constituent ranker
4 Experiments
Ranker optimisation
Speculation and negation in aligned items from papers (F1)
Speculation Negation
Random baseline 26.76 21.73Path 78.10 79.28Path+Surface 79.93 78.88Path+Rule-like 83.72 89.47Path+Surface+Rule-like 85.30 89.24
Training with the first aligned constituent in n-best parses andtesting with m-best parses did not greatly impact performance,but optimal values are: n = 1 and m = 3.
Ranker optimisation
Speculation and negation in aligned items from papers (F1)
Speculation Negation
Random baseline 26.76 21.73Path 78.10 79.28Path+Surface 79.93 78.88Path+Rule-like 83.72 89.47Path+Surface+Rule-like 85.30 89.24
Training with the first aligned constituent in n-best parses andtesting with m-best parses did not greatly impact performance,but optimal values are: n = 1 and m = 3.
n-best optimisation for combined system
0 5
10 15
20 25 0
5 10
15 20
25
74
75
76
77
F1
training n-best
testing m-best
F1
Development results
Resolving scope of gold cues (F1)
Speculation Negation
Default baseline 64.89 48.07Constituent ranker 73.67 67.46Dependency rules 73.39 70.11Combined 78.69 73.98
Evaluation results
We evaluate end-to-end performance for speculation on theCoNLL-2010 Shared Task evaluation data.
Speculation
System Prec Rec F1
Morante et al. (2010) 59.62 55.18 57.32Cue classifier + combined 62.00 57.02 59.41
Evaluation results
Negation (cross-validated on abstracts)
System Prec Rec F1
Morante et al. (2009) 66.31 65.27 65.79Cue classifier + combined 68.93 73.03 70.92
Negation (on held-out papers)
System Prec Rec F1
Morante et al. (2009) 42.49 39.10 40.72Cue classifier + combined 61.77 71.55 66.30
Evaluation results
Negation (cross-validated on abstracts)
System Prec Rec F1
Morante et al. (2009) 66.31 65.27 65.79Cue classifier + combined 68.93 73.03 70.92
Negation (on held-out papers)
System Prec Rec F1
Morante et al. (2009) 42.49 39.10 40.72Cue classifier + combined 61.77 71.55 66.30
Conclusions
I It is possible to learn a discriminative ranking functionfor choosing subtrees from HPSG-based constituentstructures that match speculation scopes;
I Combining the rule-based and ranker-based approachesto resolve the scope of cues predicted by our classifierachieves the best results to date on the CoNLL 2010 SharedTask evaluation data;
I The system is readily adapted to deal with negation, alsoachieving state-of-the-art results.
Conclusions
I It is possible to learn a discriminative ranking functionfor choosing subtrees from HPSG-based constituentstructures that match speculation scopes;
I Combining the rule-based and ranker-based approachesto resolve the scope of cues predicted by our classifierachieves the best results to date on the CoNLL 2010 SharedTask evaluation data;
I The system is readily adapted to deal with negation, alsoachieving state-of-the-art results.
Conclusions
I It is possible to learn a discriminative ranking functionfor choosing subtrees from HPSG-based constituentstructures that match speculation scopes;
I Combining the rule-based and ranker-based approachesto resolve the scope of cues predicted by our classifierachieves the best results to date on the CoNLL 2010 SharedTask evaluation data;
I The system is readily adapted to deal with negation, alsoachieving state-of-the-art results.