ryu iida tokyo institute of technology [email protected] kentaro inui yuji matsumoto nara...
TRANSCRIPT
Ryu Iida Tokyo Institute of [email protected]
Kentaro Inui Yuji Matsumoto Nara Institute of Science and Technology{inui,matsu}@is.naist.jp
Capturing Salience with a Trainable Cache Model for Zero-anaphora Resolution
1
Introduction Many researchers have focused on the
research area of anaphora (coreference) For NLP applications such as IE and MT
Anaphora resolution Search for an antecedent in the search
space
2
NTSB Chairman Jim Hall is to address a briefing on the investigation in Seattle Thursday, but board spokesman Mike Benson said Hall isn't expected to announce any findings. Benson said investigators are simulating air loads on the 737's rudder. ``It's a slow, methodical job since we don't have adequate black boxes,'' he said. Newer models of flight data recorders, or ``black boxes,'' would record the angle of the rudder and the pedal controlling it.
antecedentanaphor
Search space
Problem Large search space makes practical
anaphora resolution difficult
Task: reducing the search space 3
The National Transportation Safety Board is borrowing a Boeing 737 from Seattle's Museum of Flight as part of its investigation into why a similar jetliner crashed near Pittsburgh in 1994. The museum's aircraft, ironically enough, was donated by USAir,which operated the airplane that crashed, killing 132 people on board. The board is testing the plane's rudder controls to learn why Flight 427 suddenly rolled and crashed while on its approach to the Pittsburgh airport Sept. 8, 1994. Aviation safety investigators say a sharp movement of the rudder ( the movable vertical piece in the plane's tail ) could have caused the jet's deadly roll. NTSB Chairman Jim Hall is to address a briefing on the investigation in Seattle Thursday, but board spokesman Mike Benson said Hall isn't expected to announce any findings. Benson said investigators are simulating air loads on the 737's rudder. ``It's a slow, methodical job since we don't have adequate black boxes,'' he said. Newer models of flight data recorders, or ``black boxes,'' would record the angle of the rudder and the pedal controlling it.
Search space
Previous work
Machine learning-based approaches(Aone and Bennett, 1995; McCarthy and Lehnert, 1995; Soon et al., 2001; Ng and Cardie, 2002; Seki et al., 2002; Isozaki and Hirao, 2003; Iida et al., 2005; Iida et al., 2007a, Yang et al. 2008)
Less attention to search space problem Heuristically limit search space
e.g. system deals with candidates only occurring in N previous sentences (Yang et al. 2008)
Problem: Exclude an antecedent when it is located further than N sentences from its anaphor 4
Previous work (Cont’d)
Rule-based approaches (e.g. approaches based on Centering Theory (Grosz et al. 1995)) Only deal with the salient discourse
entities at each point of discourse status Drawback: Centering Theory only retains
information about the previous sentence Exception: Suri&McCoy (1994),
Hahn&Strube(1997) Overcome this drawback Still limited by the restrictions fundamental
to the notion of Centering Theory 5
Our solution
Reduce search space for given anaphor by applying the notion of ‘‘caching’’ introduced by Walker (1996)
6
NTSB Chairman Jim Hall is to address a briefing on the investigation in Seattle Thursday, but board spokesman Mike Benson said Hall isn't expected to announce any findings. Benson said investigators are simulating air loads on the 737's rudder. ``It's a slow, methodical job since we don't have adequate black boxes,'' he said. Newer models of flight data recorders, or ``black boxes,'' would record the angle of the rudder and the pedal controlling it.
Search space
Our solution
Reduce search space for given anaphor by applying the notion of ‘‘caching’’ introduced by Walker (1996)
7
NTSB Chairman Jim Hall is to address a briefing on the investigation in Seattle Thursday, but board spokesman Mike Benson said Hall isn't expected to announce any findings. Benson said investigators are simulating air loads on the 737's rudder. ``It's a slow, methodical job since we don't have adequate black boxes,'' he said. Newer models of flight data recorders, or ``black boxes,'' would record the angle of the rudder and the pedal controlling it.
Search space
NTSB Chairman Jim
Hall , investigators, the rudder
extract mostsalient candidates cach
e
search for antecedent
Implementation of cache models Walker (1996)’s cache model
Two devices Cache: holds most salient discourse entities Main memory: retains all other entities
Not fully specified for implementation Our approach
Specify how to retain salient candidates based on machine learning to capture both local and global foci of discourse Dynamic cache model (DCM)
8
Dynamic cache model (DCM) Dynamically update cache
information in sentence-wise manner Take into account local transition of
salience
e(i+1)1 e(i+1)2 … e(i+1)N
ci1 ci2 … ciMei1 ei2 … eiN
Cache Ci Sentence Si
Cache Ci+1
dynamic cache model
9
C(i+1)1 c(i+1)2 … c(i+1)M
retained discarded
Dynamic cache model (DCM) Difficult to create the training
instances for the problem where the model retains the N most salient candidates
e(i+1)1 e(i+1)2 … e(i+1)N
ci1 ci2 … ciMei1 ei2 … eiN
Cache Ci Sentence Si
Cache Ci+1
10
C(i+1)1 c(i+1)2 … c(i+1)M
retained discarded
dynamic cache model
DCM: ranking candidates Recast candidate selection as ranking
problem in machine learning Training instances created from
anaphoric relations annotated in corpus For given candidate C at the current context,
(i.e. either C is in current cache or C appears in current sentence)if C is referred to by anaphor appearing in following contexts ‘‘retained’’ (1st place)otherwise ‘‘discarded’’ (2nd place)
11
DCM: creating training instancesC1 C2
C3 C4 Ai C5 C6
C7 Aj Ak C8
S1
S2
S3
Training instances
12
Annotated corpus
retained (1st): C1 C4
discarded (2nd ): C3 C5 C6
C2 is not referred to by any anaphors appearing in the following contexts discarded
C: candidate A: anaphor
retained (1st): C1
discarded (2nd ): C2
C1 is referred to by Ai in S2 retained
Zero-anaphora resolution process
Tom-wa kouen-o sanpos-iteimashitaTom was walking in the park
(φ-ga) John-ni funsui-no mae-de a-tta(He) met John in front of the fountain
(φ-ga) (φ-ni) kinou-no shiai-no kekka-o ki-kimashita(Tom) asked (John) the result of yesterday's game
(φ-ga) amari yoku na-katta youda(The result) does not seem to be very good.
cache (size=2)
13
φ: zero-pronoun
Zero-anaphora resolution process
Tom-wa kouen-o sanpos-iteimashitaTom was walking in the park
(φ-ga) John-ni funsui-no mae-de a-tta(He) met John in front of the fountain
(φ-ga) (φ-ni) kinou-no shiai-no kekka-o ki-kimashita(Tom) asked (John) the result of yesterday's game
(φ-ga) amari yoku na-katta youda(The result) does not seem to be very good.
cache (size=2)Tom (Tom), kouen (park)
Tom (Tom), John (John)
14
φ: zero-pronoun
Zero-anaphora resolution process
Tom-wa kouen-o sanpos-iteimashitaTom was walking in the park
(φ-ga) John-ni funsui-no mae-de a-tta(He) met John in front of the fountain
(φ-ga) (φ-ni) kinou-no shiai-no kekka-o ki-kimashita(Tom) asked (John) the result of yesterday's game
(φ-ga) amari yoku na-katta youda(The result) does not seem to be very good.
cache (size=2)Tom (Tom), kouen (park)
Tom (Tom), John (John)
Tom (Tom), kekka (result)
15
φ: zero-pronoun
Zero-anaphora resolution process
Tom-wa kouen-o sanpos-iteimasitaTom was walking in the park
(φ-ga) John-ni funsui-no mae-de a-tta(He) met John in front of the fountain
(φ-ga) (φ-ni) kinou-no shiai-no kekka-o ki-kimaista(Tom) asked (John) the result of yesterday's game
(φ-ga) amari yoku na-katta youda(The result) does not seem to be very good.
cache (size=2)Tom (Tom), kouen (park)
Tom (Tom), John (John)
Tom (Tom), kekka (result)
16
φ: zero-pronoun
Evaluating caching mechanism on Japanese zero-anaphora resolution Investigate how cache model
contributes to candidate reduction Explore candidate reduction ratio of
each cache model and its coverage
Coverage =
Create a ranker using Ranking SVM (Joachims 2002)
17
# of antecedents retained in cache models
# of all antecedents
Data set
NAIST Text Corpus (Iida et al., 2007) Data set for cross-validation: 287 articles 699 zero-pronouns
Conduct 5-fold cross-validation
18
Baseline cache models Centering-based cache model
store the preceding ‘wa’ (topic)-marked or ‘ga’ (subject)-marked candidate antecedents
An approximation of the model proposed by Nariyama (2002)
Sentence-based cache model (Soon et al. 2001, Yang et al. 2008, etc.) Store candidate antecedents in the N previous
sentences of a zero-pronoun Static cache model
Does not capture dynamics of text Rank candidates at once according to rank
based on global focus of text19
Feature set for cache models Default features
Part-of-speech, located in a quoted sentence or not, located in the beginning of a text, case marker (i.e. wa, ga), syntactically depends on the last bunsetsu unit (i.e. as basic unit in Japanese) in a sentence
Features only used in DCM The set of connectives intervening between Ci
and the beginning of the current sentence S The number of anaphoric chain Ci is currently stored in the cache or not
Distances between S and Ci in terms of a sentence
20
Results: caching mechanism
Search space
CM: centering-based model, SM: sentence-based model21
Evaluating antecedent identification Antecedent identification task of inter-
sentential zero-anaphora resolution cache size: 5 to all candidates
Compare the three cache models Centering-based cache model Sentence-based cache model Dynamic cache model
Investigate computational time22
Antecedent identification and anaphoricity determination modelsAntecedent identification model Tournament model (Iida et al., 2003)
Select the most likely candidate antecedent by conducting a series of matches in which candidates compete with each others
Anaphoricity determination model Selection-then-classification model
(Iida et al., 2005) Determine anaphoricity by judging an
anaphor as anaphoric only if its most likely candidate is judged as its antecedent.
23
Results of antecedent identification Model Accura
cyRuntime coverag
e
CM 0.441 11m03s 0.651
SM (s=1) 0.381 6m54s 0.524
SM (s=2) 0.448 13m14s 0.720
SM (s=3) 0.466 19m01s 0.794
DCM (n=5) 0.446 4m39s 0.664
DCM (n=10) 0.441 8m56s 0.764
DCM (n=15) 0.442 12m53s 0.858
DCM (n=20) 0.443 16m35s 0.878
DCM (n=#candidates)
0.452 53m44s 0.928CM: centering-based model, SM: sentence-based model, DCM: dynamic cache model 24
Results of antecedent identification Model Accura
cyRuntime coverag
e
CM 0.441 11m03s 0.651
SM (s=1) 0.381 6m54s 0.524
SM (s=2) 0.448 13m14s 0.720
SM (s=3) 0.466 19m01s 0.794
DCM (n=5) 0.446 4m39s 0.664
DCM (n=10) 0.441 8m56s 0.764
DCM (n=15) 0.442 12m53s 0.858
DCM (n=20) 0.443 16m35s 0.878
DCM (n=#candidates)
0.452 53m44s 0.928CM: centering-based model, SM: sentence-based model, DCM: dynamic cache model 25
Conclusion
Proposed a machine learning-based cache model in order to reduce the computational cost of anaphora resolution Recast discourse status updates as ranking
problems of discourse entities by using anaphoric relations annotated in corpus as clues
Our learning-based cache model drastically reduces search space while preserving accuracy
26
Future work
The procedure for zero-anaphora resolution is carried out linearly i.e. antecedent is independently selected
without taking into account any other zero-pronouns
Trends in anaphora resolution have shifted to more sophisticated approaches which globally optimize the interpretation of all referring expressions in a textPoon & Domingos (2008): Markov Logic Network
Incorporate our caching mechanism into such global approaches
27
Thank you for your kind attention
28
29
Feature set used in antecedent identification models
30
Overall zero-anaphora resolution Investigate the effects of introducing
the cache model on overall zero-anaphora resolution including intra-sentential zero-anaphora resolution Compare the zero-anaphora resolution
model with different cache sizes
Iida et al (2006)’s model Exploit syntactic patterns as features
31
Results of overall zero-anaphora resolution
All models achieved almost the same performance
32
Static cache model (SCM)
Grosz & Sidner (1995)’s global focus Entity or set of entities salient
throughout the entire discourse
Characteristics of SCM Does not capture dynamics of the text Select N most salient candidates
according to the rank based on the global focus of the text
33
SCM: Training and test phase
Training phase Test phaseC1 C2
C3 C4 φi C5 C6
C7 φj φk C8
φl C9 C10
Ci: candidate antecedentφj: zero-pronoun
S1
S2
S3
S4
1st: C1 C4 C7
2nd : C2 C3 C5 C6 C8
C9 C10
Training instances
C’1 C’2 C’3
C’4 C’5 C’6
C’7 C’8 C’9
ranker
1st: C’1 2nd: C’6 .. Nth:C’3
N most salient candidates
34
Zero-anaphora resolution processFor a given zero-pronoun φ in sentence S1. Intra-sentential anaphora resolution
Search for an antecedent A in S If Ai is found, return Ai; otherwise go to step 2
2. Inter-sentential anaphora resolution Search for an antecedent Aj in the cache
If Aj is found, return Aj; otherwise φ is judged as exophoric
3. Cache update Take into account the candidates in S as well as
the already retained candidates in the cache
35
Zero-anaphora
Zero-anaphor: a gap with an anaphoric function
Zero-anaphora resolution becoming important in many applications In Japanese, even obligatory arguments
of predicates are often omitted when they are inferable from the context
45% nominatives are omitted in newspaper articles
36
Zero-anaphora (Cont’d)
Two sub-tasks Anaphoricity determination
Determine whether a zero-pronoun is anaphoric
Antecedent identification Select an antecedent for a given zero-
pronoun
Maryi-wa Johnj-ni (φj-ga) tabako-o yameru-youni it-ta .Maryi-TOP Johnj-DAT (φj-NOM) smoking-OBJ quit-COMP say-PAST PUNCMaryi told Johnj to quit smoking.
(φi-ga) tabako-o kirai-dakarada . (φi-NOM) smoking-OBJ hate-BECAUSE PUNC
Because (shei) hates people smoking.
37