Download - Semantic Role Labeling for Arabic using Kernel Methods Mona Diab Alessandro Moschitti Daniele Pighin
Semantic Role Labeling for Arabic using Kernel Methods
Mona DiabAlessandro Moschitti
Daniele Pighin
What is SRL?
Proposition
John opened the door
What is SRL?
Proposition
[John]Agent [opened]Predicate [the door]Theme
What is SRL?
Proposition
[John]Agent [opened]Predicate [the door]Theme
Subject Object
What is SRL?
Proposition
[John]Agent [opened]Predicate [the door]Theme
Subject Object
[The door]Theme [opened]Predicate
What is SRL?
Proposition
[John]Agent [opened]Predicate [the door]Theme
Object
Subject[The door]Theme [opened]Predicate
What is SRL?
Proposition
[John]Agent [opened]Predicate [the door]Theme
FrameNet Agent Container_portal
[The door]Theme [opened]Predicate
What is SRL?
Proposition
[John]Agent [opened]Predicate [the door]Theme
PropBank ARG0 ARG1
[The door]Theme [opened]Predicate
Why SRL?
• Useful for information extraction
• Useful for Question Answering
• Useful for Machine Translation?
Our Goal
Last Sunday India to official visit Rongji Zhu the-Chinese the-Ministers president started
The Chinese Prime Minister Zho Rongji started an official visit to India last sunday
Our Goal
Last Sunday India to official visit Rongji Zhu the-Chinese the-Ministers president started
The Chinese Prime Minister Zho Rongji started an official visit to India last Sunday
ARGM-TMP
RoadMap
• Arabic Characteristics
• Our Approach
• Experiments & Results
• Conclusions & Future Directions
Morphology
• Rich complex morphology– Templatic, concatenative, derivational,
inflectional• wbHsnAthm• w+b+Hsn+At+hm• and by virtue(s) their
– Verbs are marked for tense, person, gender, aspect, mood, voice
– Nominals are marked for case, number, gender, definiteness
• Orthography is underspecified for short vowels and consonant doubling (diacritics)
Syntax
Characteristics relevant for SRL
• Typical underspecification of short vowels masks morphological features such as case and agreement– Example:
rjl Albyt AlkbyrMan_masc the-house_masc the-big_masc
“the big man of the house” or “the man of the big house”
Characteristics relevant for SRL
• Typical underspecification of short vowels masks morphological features such as case and agreement– Example:
rjlu Albyti AlkbyriMan_masc-Nom the-house_masc-Gen the-big_masc-Gen
the man of the big house
Characteristics relevant for SRL
• Typical underspecification of short vowels masks morphological features such as case and agreement– Example:
rjlu Albyti AlkbyruMan_masc-Nom the-house_masc-Gen the-big_masc-Nom
the big man of the house
Characteristics relevant for SRL
• Idafa constructions make indefinite nominals syntactically definite hence allowing for agreement, therefore better scoping– Example:
[rjlu Albyti] AlkbyruMan_masc-Nom-Def the-house_masc-Gen the-big_masc-Nom-Def
the big man of the house
Characteristics relevant for SRL
Characteristics relevant for SRL
Characteristics relevant for SRL
Characteristics relevant for SRL
Characteristics relevant for SRL
• Passive constructions differ from English in that they can not have an explicit non-instrument underlying subject, hence only ARG1 and ARG2. ARG0 are not allowed.
– Example:qutil Emru bslAHiK qAtliK*qutl [Emru]ARG1 [bslmY]ARG0
*[Amr]ARG1 was killed [by SalmA]ARG0
Characteristics relevant for SRL
• Passive constructions differ from English in that they can not have an explicit non-instrument underlying subject, hence only ARG1 and ARG2. ARG0 are not allowed.
– Example:qutil [Emru]ARG1 [bslAHiK qAtliK]ARG2
[Amr]ARG1 was killed [by a deadly weapon]ARG2
Characteristics relevant for SRL
Our Approach
Semantic Role Labeling Steps
• Given a sentence and an associated syntactic parse
• An SRL system identifies the arguments for a given predicate
• The arguments are identified in two steps– Argument boundary detection– Argument role classification
• For the overall system we apply a heuristic for argument label conflict resolution
• one label per argument
The Sentence
The Chinese Prime Minister Zho Rongji started an official visit to India last sunday
The Parse Tree
Boundary Identification
Role Classification
Our Approach
• Experiment with different kernels
• Experiment with Standard Features (similar to English) and rich morphological features specific to Arabic
Different Kernels• Polynomial Kernels (1-6) with standard
features • Tree Kernels
Where Nt1 and Nt2 are the sets of nodes in t1 and t2, and Δ(.) evaluates the common substructures rooted in n1 and n2
Argument Structure Trees (AST)
NP
D N
VP
V
delivers
a talk
S
N
Paul
in
PP
IN NP
jj
formal
N
styleArg. 1
Defined as the minimal subtree encompassing the predicate and one of its arguments
Tree Substructure Representations
NP
D N
VP
V
delivers
a talk
NP
D N
VP
V
delivers
a
NP
D N
VP
V
delivers
NP
D N
VP
V NP
VP
V
The overall set of AST substructures
NP
D N
a talk
NP
D N
NP
D N
a D N
a talk
NP
D N NP
D N
VP
V
delivers
a talk
V
delivers
NP
D N
VP
V
a talk
NP
D N
VP
V
NP
D N
VP
V
a
NP
D
VP
V
talk
N
a
NP
D N
VP
V
delivers
talk
NP
D N
VP
V
delivers NP
D N
VP
V
delivers
NP
VP
V NP
VP
V
delivers
talk
Explicit feature space
zxrr
⋅
..,0)..,0,..,1, .,1,.,1,..,0,. ..,0,..,0,..,1, ..,1,..,1,..,0, 0,(=xr
• counts the number of common substructures
NP
D N
a talk
NP
D N
a
NP
D N NP
D N
VP
V
delivers
a talk
NP
D N
VP
V
a talk
NP
D N
VP
V
talk
Standard Features• Predicate: Lemmatization of the predicate• Path: Syntactic path linking the predicate and an
argument NNNPVPVBD• Partial Path: Path feature limited to the branching of
arg• No Direction path without the traversals• Phrase type• Last and first POS of words in the arguments• Verb subcategorization frame: production expanding
the predicate parent node• Position of the argument relative to predicate• Syntactic Frame: positions of the surrounding NPs
relative to predicate
Extended Features for Arabic
Definiteness, Number, Gender, Case, Mood, Person, Lemma (vocalized), English Gloss, Unvocalized surface
form, Vocalized Surface form
• Expanded the leaf nodes in AST with 10 attribute value pairs creating EAST
Arabic AST
Sample AST from our example
ARG0
Arabic AST
Sample AST from our example
ARG0
Extended AST (EAST)
……
Experiments & Results
Experimental Set Up
• SemEval 2007 Task 18 data set, Pilot Arabic Propbank
• 95 most frequent verbs in ATB3v2• Gold parses, Unvowelized, Bies
reduced POS tag set (25 tags)• Num Sentences: Dev (886), Test (902),
Train (8402)• 26 role types (5 numbered ARGs)
Experimental Set Up
• Experimented only with 350k examples
• We use the SVM-Light TK Toolkit (Moschitti, 2004, 2006) with SVM light default parameters
• Evaluation metrics of precision, recall and F measure are obtained using the CoNLL evaluator
Boundary Detection Results
Role Classification Results
Overall Results
Observations-BD
• AST and EAST don’t differ much for boundary detection
• AST+EAST+ Poly (3) gives best BD results
• AST and EAST perform significantly better than Poly (1)
Observations – RC & SRL
Conclusions
• Explicitly encoding the rich morphological features helps with SRL in Arabic
• Tree Kernels is indeed a feasible way of dealing with large feature spaces that are structural in nature
• Combining kernels yields better results
Future Directions
Thank You
The parse tree