a syntactic constituent ranker for speculation and …...2011/09/21 · university of oslo:...

University of Oslo : Department of Informatics

A Syntactic Constituent Ranker forSpeculation and Negation Scope Resolution

Erik Velldal Lilja ØvrelidJonathon Read† Stephan Oepen

†[email protected]

Date: 21 September 2011 Venue: Master Seminar in Language TechnologyDepartment of InformaticsUniversity of Oslo

Introduction

TaskGiven an appropriate cue of speculation or negation, determineits scope—the subsequence of the sentence that is affected.

For example:I {The unknown amino acid

⟨may⟩

be used by these species}.I Samples of the protein pair space were taken {〈instead of〉

considering the whole space} [...]

Some applications:I Information ExtractionI Sentiment Analysis

Introduction

Our approach

Generate candidate scopes from constituents in syntactic trees;use a supervised learner to rank these candidates

unlabeledsentence

syntacticconstituent

parsing

data-driven

ranking

speculation/ negation

predictedscope

Outline

1 Data sets and evaluation measures

2 A brief review of speculation and negation at LNS

3 A syntactic constituent ranker

4 Experiments

Outline

4 Experiments

Data Sets

BioScope Corpus (Vincze et al., 2008)I 20,924 sentences annotated with speculation/negation cues

and associated scopes.I Sentences are drawn from biomedical abstracts, full papers

and clinical reports.

CoNLL-2010 Shared Task Evaluation Data (Farkas et al., 2010)I A further set of 5,003 sentences from biomedical papers,

annotated for speculation.

Data Sets

Total Speculation NegationSentences Sentences Cues Sentences Cues

Abstracts 11,871 2,101 2,659 1,597 1,719Papers 2,670 519 668 339 376CoNLL 5,003 790 1,033 – –

Note on data set splits for development and evaluation:Speculation: all of the Abstracts and Papers are used for

development; the CoNLL set is held-out.Negation: 90% of the Abstracts and Papers are used for

development; the remainder is held-out.

Preprocessing

I BioScope XML converted to stand-off characterisation.I Tokenisation performed using the GENIA tagger

(Tsuruoka et al., 2005), supplemented with a finite-statetokeniser

I Part-of-speech tags from both the GENIA tagger (for higheraccuracy in the biomedical domain) and TnT (Brants, 2000)

I Dependency parsing using MaltParser (Nivre et al., 2006)stacked with the XLE platform (Crouch et al., 2008) withthe English grammar developed by Butt et al., (2002).

I Constituent tree parsing with PET (Callmeier, 2002) andthe English Resource Grammar (Flickinger 2002).

Preprocessing

Evaluation measures

Evaluation was performed using the scoring software of theCoNLL 2010 Shared Task.

Prec =tp

tp+fp Rec =tp

tp+fn F1 = 2×Prec×RecPrec+Rec

A true positive requires identification of all words in scope.

Outline

4 Experiments

Supervised classification of cues

We employ a binary word-by-word linear support vectormachine classifier using the SVMlight toolkit.

A simpler approach is to treat the set of cues as near-closedclass, applying filtering to disregard all known non-cues beforetraining and applying the classifier.

Cues described using features of:I n-grams of lemmas up to three positions to the left; andI n-grams of surface forms up to two positions to the right.

Multi-word cues are handled in post-processing by matchingpatterns observed in the development data lemmas.

Speculationcannot {be}? excludeeither .+ orindicate thatmay,? or may notno {evidence | proof | guarantee}not {known | clear | evident | understood | exclude}raise the .* {possibility | question | issue | hypothesis}whether or not

Negationrather than{can|could} notno longerinstead ofwith the * exception ofneither * nor{no(t?)|neither} * nor

Development Results: Speculation

Prec Rec F1

Baseline 90.49 81.16 85.57Word-by-word 94.65 82.26 88.02Filtering 94.13 84.60 89.11

Development Results: Negation

Prec Rec F1

Word-by-word 87.69 97.61 92.38Filtering 92.49 97.72 95.03

Development Results: Speculation

Prec Rec F1

Baseline 90.49 81.16 85.57Word-by-word 94.65 82.26 88.02Filtering 94.13 84.60 89.11

Development Results: Negation

Prec Rec F1

Word-by-word 87.69 97.61 92.38Filtering 92.49 97.72 95.03

Hueristics for scope resolution

Hueristics, activated by the cue’s part-of-speech operate overdependency structures:

I Coordinations scope over their conjuncts;

I Prepositions scope over their argument and itsdescendants;

I Attributive adjectives scope over their nominal head andits descendants;

I Predicative adjectives scope over referential subjects andclausal arguments, if present;

I Modals inherit subject scope from their lexical verb andscope over their descendants; and

I Passive or raising verbs scope over referential subjects andthe verbal descendants.

In the case of negation an additional set of rules are applied:

I Determiners scope over their head node and itsdescendants;

I The noun none scopes over the entire sentence if it is thesubject, otherwise it scopes over its descendants;

I Other nouns or verbs scope over their descendants;I Adverbs with a verbal head scope over the descendants of

the lexical verb; and

I Other adverbs scope over the descendants of the head.

If none of these rules activate then a default scope is applied bylabeling from the cue to the sentence final punctuation.

Development Results: Speculation Scope for Gold Cues

Abstracts Default Baseline 69.84Dependency Rules 73.67

Papers Default Baseline 45.21Dependency Rules 72.31

Development Results: Negation Scope for Gold Cues

Development Results: Speculation Scope for Gold Cues

Development Results: Negation Scope for Gold Cues

Outline

4 Experiments

Selecting candidate constituents

Learn a ranking function over candidate constituents within aparse.

sb-hd mc c

sp-hd n c

d - the le

aj-hdn norm c

aj - i le

unknown

n ms-cnt ilr

n - mc le

amino acid

hd-cmp u c

v vp mdl-p le⟨may⟩ hd-cmp u c

v prd be le

hd-cmp u c

v pas odlr

v np le

hd-cmp u c

p np ptcl le

sp-hd n c

d - pl le

w period plr

n pl olr

n - c le

species.

Candidates are generated by following the path from the cue tothe root of the tree, for example:

I modal verb : FalseThe unknown amino acid {

⟨may⟩} by used by these species.

I head – complement : FalseThe unknown amino acid {

⟨may⟩

be used by these species}.

I subject – head : True{The unknown amino acid

⟨may⟩

Alignment of constituents and scopes

Scope boundaries must align with the boundaries ofconstituents for the approach to be successful.

Alignment can be improved by applying slackening rules,including:I eliminating constituent-final punctuation;

I eliminating constituent-final parenthesised elements;

I reducing scope to the left when the left-most terminal is anadverb and not the cue; and

I ensuring the scope starts with the cue when it is a noun.

Scope boundaries must align with the boundaries ofconstituents for the approach to be successful.

Alignment can be improved by applying slackening rules,including:I eliminating constituent-final punctuation;

I eliminating constituent-final parenthesised elements;

I reducing scope to the left when the left-most terminal is anadverb and not the cue; and

I ensuring the scope starts with the cue when it is a noun.

1 10 20 30 40 50

n-best parsing results

Analysis of the non-aligned items indicated that mismatchesarise from:I parse ranking errors (40%),I non-syntactic scope (25%) e.g.

“This allows us to {⟨address a number of questions

⟩: what proportion

of [...] can be attributed to a known domain-domain interaction}?”,I divergent syntactic theories (16%),I parenthesised elements (13%) andI annotation errors (6%)

In the papers:I A parse is produced for 86% of sentences.I Alignment of constituents and speculation scopes is 81%

when inspecting the first parse of sentences.I The upper-bound performance of the ranker is around

76% when resolving the scope of speculation.

I A similar analysis indicates an upper-bound of 77% fornegation.

In the papers:I A parse is produced for 86% of sentences.I Alignment of constituents and speculation scopes is 81%

when inspecting the first parse of sentences.I The upper-bound performance of the ranker is around

76% when resolving the scope of speculation.

I A similar analysis indicates an upper-bound of 77% fornegation.

Training procedure

Given a parsed BioScope sentence:I The candidate consitutuent that corresponds to the scope is

labeled as correct;I all other candidates are labeled as incorrect.I Learn a scoring function (using SVMlight).

Describe candidates with three classes of features that record:I paths in the tree;I surface order; andI rule-like linguistic phenomena.

Training procedure

sb-hd mc c

sp-hd n c

d - the le

aj-hdn norm c

aj - i le

unknown

n ms-cnt ilr

n - mc le

amino acid

hd-cmp u c

v vp mdl-p le⟨may⟩ hd-cmp u c

v prd be le

hd-cmp u c

v pas odlr

v np le

hd-cmp u c

p np ptcl le

sp-hd n c

d - pl le

w period plr

n pl olr

n - c le

species.

Path features

Paths from speculation cues to candidate constituents:I specific, e.g. ‘‘v vp mdl-p le\hd-cmp u c\sb-hd mc c’’

I general, e.g. ‘‘v vp mdl-p le\\sb-hd mc c’’

With lexicalised variants:I specific, e.g. ‘‘may : v vp mdl-p le\hd-cmp u c\sb-hd mc c’’

I general, e.g. ‘‘may : v vp mdl-p le\\sb-hd mc c’’

Bigrams formed of nodes and their parents, e.g.‘‘v vp mdl-p le hd-cmp u c’’ and‘‘hd-cmp u c sb-hd mc c’’

Surface features

I Bigrams formed of preterminal lexical types, e.g.‘‘d - the le aj-hdn norm c’’, ‘‘aj-hdn norm c n - mc le’’, etc.

I Cue position within candidate (in tertiles) , e.g. ‘‘33%-66%’’.

I Candidate size relative to sentence length (in quartiles),e.g. ‘‘75%-100%’’

I Punctuation preceeding the candidate, e.g. ‘‘false’’

I Punctuation at end of the candidate e.g. ‘‘true’’

Rule-like features

Detection of:

I Passivisation

I Subject control verbs occuring with passivised verbs

I Subject raising verbs

I Predicative adjectives

Detecting control verbs with a passivised verb

1. Cue is a subject control verb.

2. Find the first subject – headparent of (1) on the head path.

3. Find the firsthead – complement child of (2)on the head path.

4. The right-most daughter of (3)or one of its descendents is apassivized verb.

5. The transitive head daughterof the left-most daughter of (2)is not an expletive it or there.

subject

control

subject–

expletive

pronoun

the unknown amino acid

head–

complement

passive

<may> be used by these species.

Using predictions from auxiliary systems

Combine with other systems by introducing features recordingmatches with:I the start of the auxiliary prediction;

I the end of the auxiliary prediction; or

I the entirety of the auxiliary prediction.

In cases where an ERG parse is unavailable we back off to theprediction of the dependency rules.

Outline

4 Experiments

Ranker optimisation

Speculation and negation in aligned items from papers (F1)

Speculation Negation

Random baseline 26.76 21.73Path 78.10 79.28Path+Surface 79.93 78.88Path+Rule-like 83.72 89.47Path+Surface+Rule-like 85.30 89.24

Training with the first aligned constituent in n-best parses andtesting with m-best parses did not greatly impact performance,but optimal values are: n = 1 and m = 3.

Ranker optimisation

Speculation and negation in aligned items from papers (F1)

Random baseline 26.76 21.73Path 78.10 79.28Path+Surface 79.93 78.88Path+Rule-like 83.72 89.47Path+Surface+Rule-like 85.30 89.24

Training with the first aligned constituent in n-best parses andtesting with m-best parses did not greatly impact performance,but optimal values are: n = 1 and m = 3.

n-best optimisation for combined system

20 25 0

training n-best

testing m-best

Development results

Resolving scope of gold cues (F1)

Default baseline 64.89 48.07Constituent ranker 73.67 67.46Dependency rules 73.39 70.11Combined 78.69 73.98

Evaluation results

We evaluate end-to-end performance for speculation on theCoNLL-2010 Shared Task evaluation data.

Speculation

System Prec Rec F1

Morante et al. (2010) 59.62 55.18 57.32Cue classifier + combined 62.00 57.02 59.41

Evaluation results

Negation (cross-validated on abstracts)

System Prec Rec F1

Negation (on held-out papers)

System Prec Rec F1

Evaluation results

Negation (cross-validated on abstracts)

System Prec Rec F1

Negation (on held-out papers)

System Prec Rec F1

Conclusions

I It is possible to learn a discriminative ranking functionfor choosing subtrees from HPSG-based constituentstructures that match speculation scopes;

I Combining the rule-based and ranker-based approachesto resolve the scope of cues predicted by our classifierachieves the best results to date on the CoNLL 2010 SharedTask evaluation data;

I The system is readily adapted to deal with negation, alsoachieving state-of-the-art results.

Conclusions

a syntactic constituent ranker for speculation and …...2011/09/21 · university of oslo:...

Documents