robust shallow semantic parsing of text › ~dipanjan › ttic_presentation.pdf · semantic parsing...
TRANSCRIPT
Dipanjan DasCarnegie Mellon University
Toyota Technological Institute at Chicago
February 14, 2012
Robust Shallow Semantic Parsing of Text
Natural Language Understanding
2
Natural Language Understanding
I want to go to Chicago on Sunday
3
I want to go to on Sunday
Natural Language Understanding
P V ADP V ADP N ADP N
Shallow Syntax:Part-of-Speech Tagging
4
Chicago
Natural Language Understanding
Deeper Syntax:Dependency parsing
I want to go to on Sunday
P V ADP V ADP N ADP N
5
Chicago
Natural Language Understanding
Shallow Semantics: Frames and Roles
I want to go to on Sunday
P V ADP V ADP N ADP N
6
Chicago
Natural Language Understanding
Shallow Semantics: Frames and Roles
I want to go to on Sunday
P V ADP V ADP N ADP N
Encodes an eventor
scenario
7
Chicago
Natural Language Understanding
Shallow Semantics: Frames and Roles
I want to go to on Sunday
P V ADP V ADP N ADP N
8
Chicago
Natural Language Understanding
Shallow Semantics: Frames and Roles
I want to go to on Sunday
P V ADP V ADP N ADP N
participant orrole
for the frame
9
Chicago
Natural Language Understanding
Shallow Semantics: Frames and Roles
I want to go to on Sunday
Experiencer
10
Chicago
Natural Language Understanding
Shallow Semantics: Frames and Roles
Focus of this talk!(Das, Schneider, Chen and Smith, NAACL 2010;
Das and Smith, ACL 2011)
11
I want to go to on Sunday
Experiencer
Chicago
1. Why semantic analysis?
3. Semi-supervised learning for robustness
2. Statistical models for structure prediction
12
1. Why semantic analysis?
3. Semi-supervised learning for robustness
2. Statistical models for structure prediction
ApplicationsMotivation Choice offormalism
13
1. Why semantic analysis?
3. Semi-supervised learning for robustness
2. Statistical models for structure prediction
Argument identificationFrame identification
14
1. Why semantic analysis?
3. Semi-supervised learning for robustness
2. Statistical models for structure prediction
Argument identificationFrame identification
(use of latent variables) (dual decomposition)
15
1. Why semantic analysis?
3. Semi-supervised learning for robustness
2. Statistical models for structure prediction
Novel graph-based learning algorithms16
1. Why semantic analysis?
Motivation ApplicationsChoice offormalism
17
Bengal ’s massive stock of food was reduced to nothing
18
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
19
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
Large body of research on syntax, includingDas and Petrov, ACL 2011
Cohen, Das and Smith, EMNLP 2011Martins, Das, Smith and Xing, EMNLP 2008
20
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
storeor
financial entity?
21
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
Store of what?Of what size?Whose store?
22
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
Store of what?Of what size?Whose store?
23
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
What was reduced?
24
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
What was reduced?
To what?
25
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
What was reduced?
To what?
26
Bengal ’s massive stock of food was reduced to nothing
27
Bengal ’s massive stock of food was reduced to nothing
AUSE_CHANGE_OF_POSITION_ON_A_SCALE
28
Origins: (Computational) Linguistics
29
Origins: (Computational) Linguistics
Case Grammar(“The Case for Case”, Fillmore, 1968)
30
I gave some money to him
Case Grammar(“The Case for Case”, Fillmore, 1968)
Origins: (Computational) Linguistics
31
(cases are words/phrases required by a predicate)
I gave some money to him
Agent Object Beneficiary
Case Grammar(“The Case for Case”, Fillmore, 1968)
Origins: (Computational) Linguistics
32
(cases are words/phrases required by a predicate)
I gave some money to him
Agent Object Beneficiary
Semantic valency of a predicate
Correlation with syntax(e.g. subject and object)
Obligatory cases / optional cases
Case Grammar(“The Case for Case”, Fillmore, 1968)
Origins: (Computational) Linguistics
33
Slide idea taken from Brendan O’Connor
Case Grammar(“The Case for Case”,
Fillmore, 1968)
34
Frames(“A Framework
for Representing Knowledge”, Minsky, 1975)
Slide idea taken from Brendan O’Connor
Case Grammar(“The Case for Case”,
Fillmore, 1968)
35
Frame Semantics(“Frame Semantics”,
Fillmore, 1982)
Frames(“A Framework
for Representing Knowledge”, Minsky, 1975)
Slide idea taken from Brendan O’Connor
Case Grammar(“The Case for Case”,
Fillmore, 1968)
36
Frame Semantics(“Frame Semantics”,
Fillmore, 1982)
Relates the meaning of a word with world knowledge
(e.g. gave evokes a IVING frame;
it has several participating roles;the frame is evoked by other words,
such as bequeath, contribute, donate)
Frames(“A Framework
for Representing Knowledge”, Minsky, 1975)
Slide idea taken from Brendan O’Connor
Case Grammar(“The Case for Case”,
Fillmore, 1968)
37
Frame Semantics(“Frame Semantics”,
Fillmore, 1982)
Frames(“A Framework
for Representing Knowledge”, Minsky, 1975)
Slide idea taken from Brendan O’Connor
Case Grammar(“The Case for Case”,
Fillmore, 1968)
Relates the meaning of a word with world knowledge
(e.g. gave evokes a IVING frame;
it has several participating roles;the frame is evoked by other words,
such as bequeath, contribute, donate)
38
Case Grammar(“The Case for Case”,
Fillmore, 1968)
Frame Semantics(“Frame Semantics”,
Fillmore, 1982)
FrameNetPropBankVerbNet
NomBankOntoNotes
Frames(“A Framework
for Representing Knowledge”, Minsky, 1975)
Datasets
Slide idea taken from Brendan O’Connor
39
Case Grammar(“The Case for Case”,
Fillmore, 1968)
Frame Semantics(“Frame Semantics”,
Fillmore, 1982)
FrameNetPropBankVerbNet
NomBankOntoNotes
Frames(“A Framework
for Representing Knowledge”, Minsky, 1975)
Datasets
Data-Driven ShallowSemantic Parsing
Slide idea taken from Brendan O’Connor
40
Case Grammar(“The Case for Case”,
Fillmore, 1968)
Frame Semantics(“Frame Semantics”,
Fillmore, 1982)
FrameNetPropBankVerbNet
NomBankOntoNotes
Frames(“A Framework
for Representing Knowledge”, Minsky, 1975)
Scripts(“Scripts, Plans,
Goals and Understanding”, Schank and Abelson, 1977)
MUCACE
GENIA
Datasets
Data-Driven ShallowSemantic Parsing
Information Extraction(template filling)
Slide idea taken from Brendan O’Connor
41
Case Grammar(“The Case for Case”,
Fillmore, 1968)
Frame Semantics(“Frame Semantics”,
Fillmore, 1982)
FrameNetPropBankVerbNet
NomBankOntoNotes
Frames(“A Framework
for Representing Knowledge”, Minsky, 1975)
Scripts(“Scripts, Plans,
Goals and Understanding”, Schank and Abelson, 1977)
MUCACE
GENIA
Datasets
Data-Driven ShallowSemantic Parsing
Information Extraction(template filling)
Slide idea taken from Brendan O’Connor
42
Case Grammar(“The Case for Case”,
Fillmore, 1968)
Frame Semantics(“Frame Semantics”,
Fillmore, 1982)
FrameNetPropBankVerbNet
NomBankOntoNotes
Frames(“A Framework
for Representing Knowledge”, Minsky, 1975)
Scripts(“Scripts, Plans,
Goals and Understanding”, Schank and Abelson, 1977)
MUCACE
GENIA
Datasets
Data-Driven ShallowSemantic Parsing
Information Extraction(template filling)
Slide idea taken from Brendan O’Connor
43
Case Grammar(“The Case for Case”,
Fillmore, 1968)
Frame Semantics(“Frame Semantics”,
Fillmore, 1982)
FrameNetPropBankVerbNet
NomBankOntoNotes
Frames(“A Framework
for Representing Knowledge”, Minsky, 1975)
Scripts(“Scripts, Plans,
Goals and Understanding”, Schank and Abelson, 1977)
MUCACE
GENIA
Datasets
Data-Driven ShallowSemantic Parsing
Information Extraction(template filling)
Slide idea taken from Brendan O’Connor
structurally similar!
44
Why this Linguistic Formalism?
45
shallow
deep
46
Why this Linguistic Formalism?
shallow
deep
PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)
47
Why this Linguistic Formalism?
shallow
deep
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)
48
Why this Linguistic Formalism?
shallow
deep
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
A1 A4
PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)
49
Why this Linguistic Formalism?
shallow
deep
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
A1 A4
symbolic set of semantic roles(six total)
Verb-specific meaning for these labels
Conflates the meaning of different roles due to oversimplification
(Yi et al., 2007)
PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)
50
Why this Linguistic Formalism?
shallow
deepSemantic Parsing into Logical Forms
(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)
PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)
51
Why this Linguistic Formalism?
shallow
deep
What states border the state that borders the most states
Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)
PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)
52
Why this Linguistic Formalism?
shallow
deep
What states border the state that borders the most states
Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)
PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)
53
Why this Linguistic Formalism?
shallow
deep
What states border the state that borders the most states
Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)
Trained on very restricted domains
Poor lexical coverage
PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)
54
Why this Linguistic Formalism?
shallow
deep
PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)
Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)
Frame-Semantic Parsing(Gildea and Jurafsky, 2002, Johansson and Nugues, 2007; this work)
55
Why this Linguistic Formalism?
shallow
deep
PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)
Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)
56
Frame-Semantic Parsing(Gildea and Jurafsky, 2002, Johansson and Nugues, 2007; this work)
Does not model quantification or negation
unlike logical forms
Why this Linguistic Formalism?
shallow
deep
PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)
Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)
Deeper than PropBank-style semantic role labeling
57
Frame-Semantic Parsing(Gildea and Jurafsky, 2002, Johansson and Nugues, 2007; this work)
Why this Linguistic Formalism?
shallow
deep
PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)
Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)
Models all types of part-of-speech categories
58
Frame-Semantic Parsing(Gildea and Jurafsky, 2002, Johansson and Nugues, 2007; this work)
Why this Linguistic Formalism?
shallow
deep
PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)
Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)
Larger lexical coverage than logical form parsers
59
Frame-Semantic Parsing(Gildea and Jurafsky, 2002, Johansson and Nugues, 2007; this work)
Why this Linguistic Formalism?
shallow
deep
PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)
Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)
Lexicon actively increasing in size
60
Frame-Semantic Parsing(Gildea and Jurafsky, 2002, Johansson and Nugues, 2007; this work)
Why this Linguistic Formalism?
1. Why semantic analysis?
ApplicationsMotivation ApplicationsChoice offormalism
61
Question Answering
Bengal ’s massive stock of food was reduced to nothing
Whose stock of food was diminished ?
Possible Applications
62
Question Answering
Bengal ’s massive stock of food was reduced to nothing
Whose stock of food was diminished ?
Possible Applications
63
Question Answering
Bengal ’s massive stock of food was reduced to nothing
Whose stock of food was diminished ?
Possible Applications
64
Question Answering
Bengal ’s massive stock of food was reduced to nothing
Whose stock of food was diminished ?
Possible Applications
65
Question Answering
Bengal ’s massive stock of food was reduced to nothing
Whose stock of food was diminished ?reserve
Possible Applications
66
Question Answering
Bengal ’s massive stock of food was reduced to nothing
Whose stock of food was diminished ?reserve
Bilotti et al. (2007)
Possible Applications
67
Information Extraction
In 1997, France's stock of unirradiated civil plutonium increased to 72 tons.
Saudi Arabia has 267 billion barrels in reserves of oil.
Does Egypt have stockpiles of biological weapons?
Bengal’s massive stock of food was reduced to nothing.
Possible Applications
68
Information Extraction
In 1997, France's stock of unirradiated civil plutonium increased to 72 tons.
Saudi Arabia has 267 billion barrels in reserves of oil.
Does Egypt have stockpiles of biological weapons?
Bengal’s massive stock of food was reduced to nothing.
Possible Applications
69
Information Extraction
In 1997, France's stock of unirradiated civil plutonium increased to 72 tons.
Saudi Arabia has 267 billion barrels in reserves of oil.
Does Egypt have stockpiles of biological weapons?
Bengal’s massive stock of food was reduced to nothing.
Possessor Desc Resource
Bengal Massive food
France - unirradiated civil plutonium
Saudi Arabia 267 billion barrels oil
Egypt - Biological weapons
Possible Applications
70
Multilingual Applications
Bengal ’s massive stock of food was reduced to nothing
Possible Applications
71
Multilingual Applications
Bengal ’s massive stock of food was reduced to nothing
������ ��� ��� � ����� ��� �� ��� ���
Possible Applications
72
Multilingual Applications
Bengal ’s massive stock of food was reduced to nothing
������ ��� ��� � ����� ��� �� ��� ���
Possible Applications
73
Multilingual KBs
Multilingual Applications
Bengal ’s massive stock of food was reduced to nothing
������ ��� ��� � ����� ��� �� ��� ���
Cross lingual IR
Machine translation
Possible Applications
74
2. Statistical models for structure prediction
(Das, Schneider, Chen and Smith, NAACL 2010)
75
Structure of Lexicon and Data
76
Structure of Lexicon and Data
PLACING
Agent
Cause
Goal
Theme
Area
Time
77
Structure of Lexicon and Data
PLACING
Agent
Cause
Goal
Theme
Area
Time
frame
78
Structure of Lexicon and Data
PLACING
Agent
Cause
Goal
Theme
Area
Time
frame
roles
79
Structure of Lexicon and Data
PLACING
Agent
Cause
Goal
Theme
Area
Time
framecore roles
non-core roles
80
Structure of Lexicon and Data
PLACING
Agent
Cause
Goal
Theme
Area
Time
framecore roles
non-core roles
excludes relationship
81
Structure of Lexicon and Data
PLACING
Agent
Cause
Goal
Theme
Area
Time
framecore roles
non-core roles
archive.V, arrange.V, bag.V, bestow.V bin.V
predicates
excludes relationship
82
Structure of Lexicon and Data
PLACING
Agent
Cause
Goal
Theme
Area
Time
DISPERSAL
Agent
Cause
Individuals
Distance
Time
TRANSITIVE_ACTION
Agent
Cause
Patient
Event
Place
Time
INSTALLING
Agent
Component
Fixed_location
Area
Time
STORING
Agent
Location
Theme
Area
Time
STORE
Possessor
Resource
Supply
Descriptor
83
Structure of Lexicon and Data
PLACING
Agent
Cause
Goal
Theme
Area
Time
DISPERSAL
Agent
Cause
Individuals
Distance
Time
TRANSITIVE_ACTION
Agent
Cause
Patient
Event
Place
Time
INSTALLING
Agent
Component
Fixed_location
Area
Time
STORING
Agent
Location
Theme
Area
Time
STORE
Possessor
Resource
Supply
Descriptor
84
Structure of Lexicon and Data
PLACING
Agent
Cause
Goal
Theme
Area
Time
DISPERSAL
Agent
Cause
Individuals
Distance
Time
TRANSITIVE_ACTION
Agent
Cause
Patient
Event
Place
Time
INSTALLING
Agent
Component
Fixed_location
Area
Time
STORING
Agent
Location
Theme
Area
Time
STORE
Possessor
Resource
Supply
Descriptor
inheritance
used by
85
Structure of Lexicon and Data
Benchmark Dataset(SemEval 2007)
665 frames720 role labels
8.4K unique predicate types
Training set:2.2K sentences
11.2K predicate tokens
Test set:120 sentences
1. 1K predicate tokens86
Structure of Lexicon and Data
For comparison with past state of the art
(very small dataset)
Benchmark Dataset(SemEval 2007)
665 frames720 role labels
8.4K unique predicate types
Training set:2.2K sentences
11.2K predicate tokens
Test set:120 sentences
1. 1K predicate tokens87
Structure of Lexicon and Data
New Data(FrameNet 1.5, 2010)
877 frames1068 role labels
9.3K unique predicate types
Training set:3.3K sentences
19.6K predicate tokens
Test set:2420 sentences
4.5K predicate tokens
Benchmark Dataset(SemEval 2007)
665 frames720 role labels
8.4K unique predicate types
Training set:2.2K sentences
11.2K predicate tokens
Test set:120 sentences
1. 1K predicate tokens88
Structure of Lexicon and Data
New Data(FrameNet 1.5, 2010)
877 frames1068 role labels
9.3K unique predicate types
Training set:3.3K sentences
19.6K predicate tokens
Test set:2420 sentences
4.5K predicate tokens
Benchmark Dataset(SemEval 2007)
665 frames720 role labels
8.4K unique predicate types
Training set:2.2K sentences
11.2K predicate tokens
Test set:120 sentences
1. 1K predicate tokens89
2. Statistical models for structure prediction
Argument identificationFrame identification
(use of latent variables) (dual decomposition)
90
2. Statistical models for structure prediction
Argument identificationFrame identification
(use of latent variables) (dual decomposition)
91
Frame Identification
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
92
Frame Identification
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
ambiguous
93
Frame Identification
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
Find the best among all the frames
94
Frame Identification
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
95
Frame Identification
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
96
Frame Identification
Direct modeling using logistic regression
97
Two problems:1. Unable to model unknown
predicates at test time
Frame Identification
Direct modeling using logistic regression
98
Frame Identification
Direct modeling using logistic regression
Two problems:1. Unable to model unknown
predicates at test time
2. Number of features:
99
Frame Identification
Direct modeling using logistic regression
Two problems:1. Unable to model unknown
predicates at test time
2. Number of features: ≈ 50 million
100
Frame Identification
Logistic regression with a latent variable
101
Frame Identification
Logistic regression with a latent variable
102
Frame Identification
Logistic regression with a latent variable
103
Predicates evoking a frame in supervised data, e.g.cargo.N, inventory.N, reserve.N, stockpile.N, store.N, supply.N
evoke STORE
Frame Identification
Logistic regression with a latent variable
104
Does not look at the predicate’s surface form
Frame Identification
105
Frame Identification
106
TORE
stock.N
stockpile.N
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
Frame Identification
107
TORE
stock.N
stockpile.N
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
LexSem = {synonym}
Frame Identification
108
TORE
stock.N
stockpile.N
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
LexSem = {synonym}
If STORE
stockpile.N
synonym LexSem
Frame Identification
109
TORE
stock.N
stockpile.N
Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N
LexSem = {synonym}
If STORE
stockpile.N
synonym LexSem
comes from WordNet!
Frame Identification
110
Number of features:
Frame Identification
111
Number of features:
≈ 500 K
(1% of the features we had before)
Frame Identification
112
Number of features:
≈ 500 K
(1% of the features we had before)
Aside:
Probabilistic modeling of language meaning using syntax, lexical semantics and latent structure:
Paraphrase identification (Das and Smith, ACL 2009)
Frame Identification
Training:
Maximum conditional log-likelihood(batch training with L-BFGS)
113
Frame Identification
Fast inference:
If predicate is unseen,
otherwise,
114
Frame Identification
Results
Benchmark
60.464.0
58
63
68
73
78
83
88
93
UTD LTH ThisWork
F-Measure
New Data
115
Frame Identification
Results
Benchmark
60.464.0
58
63
68
73
78
83
88
93
UTD LTH ThisWork
F-Measure
New Data
Bejan and Hathaway, (2007)
116
Frame Identification
Results
Benchmark
60.464.0
58
63
68
73
78
83
88
93
UTD LTH ThisWork
F-Measure
New Data
Johansson and Nugues, (2007)
117
Frame Identification
Results
Benchmark
60.464.0
68.3
58
63
68
73
78
83
88
93
UTD LTH ThisWork
F-Measure
New Data
118
Frame Identification
Results
Benchmark
60.464.0
68.3
58
63
68
73
78
83
88
93
UTD LTH ThisWork
F-Measure
auto predicates
New Data
119
Frame Identification
Results
Benchmark
60.464.0
68.3
74.2
58
63
68
73
78
83
88
93
UTD LTH ThisWork
ThisWork
F-Measure
auto predicates givenpredicates
New Data
120
Frame Identification
Results
Benchmark New Data
60.464.0
68.3
74.2
58
63
68
73
78
83
88
93
UTD LTH ThisWork
ThisWork
F-Measure
auto predicates givenpredicates
90.5
58
63
68
73
78
83
88
93
ThisWork
F-Measure
givenpredicates
121
Frame Identification
Results
Benchmark New Data
60.464.0
68.3
74.2
58
63
68
73
78
83
88
93
UTD LTH ThisWork
ThisWork
F-Measure
auto predicates givenpredicates
90.5
80.0
58
63
68
73
78
83
88
93
ThisWork
ThisWork
F-Measure
givenpredicates
no hidden variable
122
2. Statistical models for structure prediction
Argument identificationFrame identification
(use of latent variables) (dual decomposition)
123
Argument Identification
Bengal ’s massive stock of food was reduced to nothing
TORE
124
Argument Identification
Bengal ’s massive stock of food was reduced to nothing
TORE
125
Argument Identification
stock
TORE
Bengal ’s
Bengal
massive stock
of food
food
massive
Bengal ’s massive stock of food
massive stock of food
126
Argument Identification
stock
TORE
Bengal ’s
Bengal
massive stock
of food
food
massive
Bengal ’s massive stock of food
massive stock of food
Ideal mapping!
127
Argument Identification
stock
TORE
Bengal ’s
Bengal
massive stock
of food
food
massive
Bengal ’s massive stock of food
massive stock of food
128
Argument Identification
stock
TORE
Bengal ’s
Bengal
massive stock
of food
food
massive
Bengal ’s massive stock of food
massive stock of food
Violates overlap constraints
129
Argument Identification
Other types of structural constraints
PLACING
Agent
Cause
Goal
Theme
Area
Time
Mutual exclusion constraint
archive.V, arrange.V, bag.V, bestow.V bin.V
130
Argument Identification
Other types of structural constraints
PLACING
Agent
Cause
Goal
Theme
Area
Time
Mutual exclusion constraint
archive.V, arrange.V, bag.V, bestow.V bin.V
If an agent places something, there cannot be a cause role in the sentence
131
Argument Identification
Other types of structural constraints
PLACING
Agent
Cause
Goal
Theme
Area
Time
Mutual exclusion constraint
archive.V, arrange.V, bag.V, bestow.V bin.V
132
AgentThe waiter placed food on the table.
In Kabul, hauling water put food on the table.Cause
Argument Identification
Other types of structural constraints
SIMILARITY
Dimension
Differentiating_fact
Entity_1
Entity_2
Degree
Requires constraint
difference.N, resemble.V,
unliike.A, vary.V
133
Argument Identification
Other types of structural constraints
A mulberry resembles a loganberry.
second entity
first entity
SIMILARITY
Dimension
Differentiating_fact
Entity_1
Entity_2
Degree
Requires constraint
difference.N, resemble.V,
unliike.A, vary.V
134
Argument Identification
Other types of structural constraints
A mulberry resembles.
SIMILARITY
Dimension
Differentiating_fact
Entity_1
Entity_2
Degree
Requires constraint
difference.N, resemble.V,
unliike.A, vary.V
135
!
Argument Identification
stock
TORE
Bengal ’s
Bengal
massive stock
of food
food
massive
Bengal ’s massive stock of food
massive stock of food
A constrained optimization
problem
136
Argument Identification
stock
TORE
Bengal ’s
Bengal
massive stock
of food
food
massive
Bengal ’s massive stock of food
massive stock of food
137
Argument Identification
stock
TORE
Bengal ’s
Bengal
massive stock
of food
food
massive
Bengal ’s massive stock of food
massive stock of food
138
Argument Identification
A constrained optimization problem
139
Argument Identification
A constrained optimization problem
a binary variable for each role, span tuple
140
Argument Identification
A constrained optimization problem
a binary vector for all role, span tuples
141
Argument Identification
A constrained optimization problem
142
Argument Identification
A constrained optimization problem
143
Argument Identification
A constrained optimization problem
Uniqueness144
Argument Identification
A constrained optimization problem
Prevents overlap 145
Argument Identification
A constrained optimization problem
more structural constraints
146
Argument Identification
A constrained optimization problem
An integer linear program (ILP)
more structural constraints
147
Argument Identification
A constrained optimization problem
An integer linear program (ILP)
more structural constraints
148
Punyakanok, Roth and Yih(2008)
Argument Identification
A constrained optimization problem
Often, very slow solutions
An integer linear program (ILP)
more structural constraints
149
Argument Identification
A constrained optimization problemAn integer linear program (ILP)
more structural constraints
Fast ILP solvers proprietary
150
Argument Identification
An alternate approach
Dual Decomposition with Alternating Direction Method of Multipliers
developed with colleagues at CMU(Das, Martins and Smith, forthcoming)
151
Argument Identification
An alternate approach
basic part:
entire space: all tuples
152
Argument Identification
An alternate approach
basic part:
entire space: all tuples
Break down the problem into many small components
e.g. -- find the best span for a role
-- for a sentence position, find the best tuple
-- for a pair of mutually exclusive roles,find the best tuple
153
Argument Identification
An alternate approach
basic part:
entire space: all tuples
Break down the problem into many small components
e.g. -- find the best span for a role
-- for a sentence position, find the best tuple
-- for a pair of mutually exclusive roles,find the best tuple
impose agreement between components
154
Argument Identification
An alternate approach
For each component, a binary vector:
155
Argument Identification
An alternate approach
For each component, a binary vector:
Total score assigned to :
156
Argument Identification
An alternate approach
For each component, a binary vector:
Total score assigned to :
uses
157
Argument Identification
An alternate approach
158
Argument Identification
An alternate approach
witness vector for consensus
159
Argument Identification
An alternate approach
Primal:
160
Argument Identification
An alternate approach
Primal':
Integer constraints relaxed
161
Argument Identification
An alternate approach
Primal':
An augmented Lagrangian function
162
Argument Identification
An alternate approachAn augmented Lagrangian function
Saddle point can be found using several decoupled
worker problems
163
Argument Identification
An alternate approachAn augmented Lagrangian function
Three types of iterative updates:
1. Lagrange multiplier updates ( )2. Consensus variable updates ( )
3. updates
164
Argument Identification
An alternate approachAn augmented Lagrangian function
Three types of iterative updates:
1. Lagrange multiplier updates ( )2. Consensus variable updates ( )
3. updatesAt decoupled workers
165
Argument Identification
An alternate approachAn augmented Lagrangian function
Three types of iterative updates:
1. Lagrange multiplier updates ( )2. Consensus variable updates ( )
3. updatesAt decoupled workers
e.g. for each role, we have a worker that imposes a XOR/uniqueness constraint
166
Argument Identification
An alternate approachAn augmented Lagrangian function
Three types of iterative updates:
1. Lagrange multiplier updates ( )2. Consensus variable updates ( )
3. updatesAt decoupled workers
e.g. for each role, we have a worker that imposes a XOR/uniqueness constraint
Projection onto a simplexa simple sort operation
167
Argument Identification
An alternate approachAn augmented Lagrangian function
Three types of iterative updates:
1. Lagrange multiplier updates ( )2. Consensus variable updates ( )
3. updatesAt decoupled workers
e.g. for each role, we have a worker that imposes a XOR/uniqueness constraint
Projection onto a simplexa simple sort operation
Challenge:define fast, simple workers
168
Argument Identification
An alternate approach
Advantages:
1) Significant speedup2) No proprietary solver necessary
169
Argument Identification
An alternate approach
Advantages:
43.12
4.780
10
20
30
40
50
CPLEX (ILP) Dual Decomposition
Time to decode the test set in seconds
1) Significant speedup2) No proprietary solver necessary
170
Argument Identification
An alternate approach
Advantages:
43.12
4.780
10
20
30
40
50
CPLEX (ILP) Dual Decomposition
Time to decode the test set in seconds
1) Significant speedup2) No proprietary solver necessary
171
Certificate of Optimality in
>99% of examples
Argument Identification
stock
TORE
Bengal ’s
Bengal
massive stock
of food
food
massive
Bengal ’s massive stock of food
massive stock of food
172
Argument Identification
stock
TORE
Bengal ’s
Learning?
173
Argument Identification
stock
TORE
Bengal ’s
Learning?
Maximum conditional log-likelihoodof local role span pairs
(batch training using L-BFGS)
174
Argument Identification
Results
New Data
Benefit of joint inference175
8283.8
76.4 76.2
79.1 79.8
70
75
80
85
Local DualDecomposition
Precision
Recall
F-Measure
Argument Identification
Results
New Data
Benefit of joint inference
8283.8
76.4 76.2
79.1 79.8
70
75
80
85
Local DualDecomposition
Precision
Recall
F-Measure
501 linguistic violations
176
Argument Identification
Results
New Data
Benefit of joint inference
501 linguistic violations No violations
177
8283.8
76.4 76.2
79.1 79.8
70
75
80
85
Local DualDecomposition
Precision
Recall
F-Measure
Full Parsing
Final Results
Benchmark New Data
37.9
45.6
35
40
45
50
55
60
65
70
UTD LTH ThisWork
F-Measure
auto predicates
178
Full Parsing
Final Results
Benchmark New Data
37.9
45.6
50.2
35
40
45
50
55
60
65
70
UTD LTH ThisWork
F-Measure
auto predicates
179
Full Parsing
Final Results
Benchmark New Data
37.9
45.6
50.253.6
35
40
45
50
55
60
65
70
UTD LTH ThisWork
ThisWork
F-Measure
auto predicates givenpredicates
180
Full Parsing
Final Results
Benchmark New Data
37.9
45.6
50.253.6
35
40
45
50
55
60
65
70
UTD LTH ThisWork
ThisWork
F-Measure
auto predicates givenpredicates
68.5
35
40
45
50
55
60
65
70
This Work
givenpredicates
F-Measure
181
1. Why semantic analysis?
3. Semi-supervised learning for robustness
2. Statistical models for structure prediction
Novel graph-based learning algorithms
182
1. Why semantic analysis?
3. Semi-supervised learning for robustness
Novel graph-based learning algorithms
(Das and Smith, ACL 2011)
2. Statistical models for structure prediction
183
90.5
405060708090
All Predicates
F-Measure
Results on Unknown Predicates
46.6
40
50
60
70
80
90
Unknown Predicates
F-Measure
Frame Identification
184
90.5
405060708090
All Predicates
F-Measure
Results on Unknown Predicates
46.6
40
50
60
70
80
90
Unknown Predicates
F-Measure
Frame Identification
68.5
25303540455055606570
All Predicates
F-Measure
30.2
25303540455055606570
Unknown Predicates
F-Measure
Full Parsing
185
Handling Unknown Predicates
Knowledge of only 9,263 predicates in supervised data
186
Handling Unknown Predicates
Knowledge of only 9,263 predicates in supervised data
187
However, English has lot more potential predicates(~65,000 in newswire English)
Handling Unknown Predicates
Knowledge of only 9,263 predicates in supervised data
However, English has lot more potential predicates(~65,000 in newswire English)
Lexicon expansion using graph-based semi-supervised learning
188
Build a graph over potential predicates as vertices• compute similarity matrix using co-occurrence statistics
Label distribution at each vertexdistribution over frames that the predicate can evoke
How can label propagation help?
189
Build a graph over potential predicates as vertices• compute similarity matrix using co-occurrence statistics
Label distribution at each vertexdistribution over frames that the predicate can evoke
How can label propagation help?
Idea very similar to Das and Petrov (ACL 2011): unsupervised lexicon expansion for POS tagging
190
Example Graph
Seed predicates
191
Example Graph
Seed predicatesUnseen predicates
192
Example Graph
Graph Propagation193
Example Graph
Graph Propagation194
Example Graph
Graph Propagation
Continues till convergence...
195
Brief Overview:Graph-Based Learning
with Labeled and Unlabeled Data
196
197
0.9
0.01
0.8
0.9
0.1
= symmetric weight matrix
0.05
198
labeled datapointsunlabeled datapoints
0.9
0.01
0.8
0.9
0.1
= symmetric weight matrix
0.05
199
0.9
0.01
0.8
0.9
0.1
= symmetric weight matrix
0.05
200
0.9
0.01
0.8
0.9
0.1
= symmetric weight matrix
0.05
supervised label distributions
201
0.9
0.01
0.8
0.9
0.1
= symmetric weight matrix
0.05
supervised label distributions
distributions to be found
202
Label Propagation
Minimize:
Das and Smith, forthcoming 203
Label Propagation
Minimize:
204
Brings the distribution of observed and induced distributions over labeled vertices closer
Label Propagation
Minimize:
205
brings the distributions of similarvertices closer
Label Propagation
Minimize:
206
induces sparse distributionsin each vertex
Constrained Inference
If predicate is seen,
otherwise,
else if predicate is in graph,
207
Constrained Inference
If predicate is seen,
otherwise,
else if predicate is in graph,
Six times faster inference on unknown predicates!
208
Results on Unknown Predicates
46.642.7
65.3
40455055606570
Supervised Self-Training Graph-Based
F-Measure
Frame Identification
209
Results on Unknown Predicates
46.642.7
65.3
40455055606570
Supervised Self-Training Graph-Based
F-Measure
Frame Identification
Full Parsing
30.226.6
46.7
25
30
35
40
45
50
Supervised Self-Training Graph-Based
F-Measure
210
Conclusions
211
Parsing using the theory of frame semantics– richer output than popular SRL systems
(Kingsbury and Palmer, 2002)
– domain general in comparison to deep semantic parsers
Significantly better performance on benchmark datasets than previous work– less independence assumptions
– only two statistical models
– semi-supervised extensions
Conclusions
212
Train parsers in other languages– Spanish, German, Portuguese, Japanese, Chinese, Swedish
Use presented techniques for deeper semantic analysis tasks–Especially semi-supervised learning
Use parser for NLP applications
–(right now being used to bootstrap more annotations)
Future Work
213
Case Grammar(“The Case for Case”,
Fillmore, 1968)
Frame Semantics(“Frame Semantics”,
Fillmore, 1982)
FrameNetPropBankVerbNet
NomBankOntoNotes
Frames(“A Framework
for Representing Knowledge”, Minsky, 1975)
Scripts(“Scripts, Plans,
Goals and Understanding”, Schank and Abelson, 1977)
MUCACE
GENIA
Data-Driven ShallowSemantic Parsing
Information Extraction(template filling)
Slide idea taken from Brendan O’Connor
More annotationsLarger lexicons
214
Last Word
Parser available at:
http://www.ark.cs.cmu.edu/SEMAFOR
(200 downloads in the past 6 months)
215
216
UDGMENT_DIRECT_ADDRESS
217