page 1 global inference and learning towards natural language understanding dan roth department of...
TRANSCRIPT
Page 1
Global Inference and Learning
Towards Natural Language Understanding
Dan RothDepartment of Computer Science
University of Illinois at Urbana-Champaign
CRI-06 Workshop on Machine Learning in Natural Language Processing
Page 2
Nice to Meet You
Page 3
Learning and Inference
Global decisions in which several local decisions play a role but there are mutual dependencies on their outcome.
(Learned) classifiers for different sub-problems
Incorporate classifiers’ information, along with constraints, in making coherent decisions – decisions that respect the local classifiers as well as domain & context specific constraints.
Global inference for the best assignment to all variables of interest.
Page 4
Comprehension
1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now.
A process that maintains and updates a collection of propositions about the state of affairs.
(ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in 1925. Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous.
Page 5
How to Address Comprehension? (cartoon)
It’s an Inference Problem
Map into a well defined language
Use standard reasoning tools
Huge number of problems:
Variability of Language
Knowledge Acquisition
Reasoning Patterns
multiple levels of ambiguity make language precise
Canonical Mapping? Underspecificity calls for “purposeful”, goal specific mapping
Statistics: know a word by its neighbors
Counting
Machine Learning
Clustering
Probabilistic models
Structured Models
Classification
Page 6
Illinois’ bored of education board
...Nissan Car and truck plant is ……divide life into plant and animal kingdom
(This Art) (can N) (will MD) (rust V) V,N,N
The dog bit the kid. He was taken to a veterinarian a hospital
What we Know: Stand Alone Ambiguity Resolution
Learn a function f: X Y that maps observations in a domain to one of several categories or <
Broad Coverage
Page 7
Theoretically: generalization bounds How many example does one need to see in order to
guarantee good behavior on previously unobserved examples.
Algorithmically: good learning algorithms for linear representations.
Can deal with very high dimensionality (106 features) Very efficient in terms of computation and # of examples.
On-line. Key issues remaining:
Learning protocols: how to minimize interaction (supervision); how to map domain/task information to supervision; semi-supervised learning; active learning; ranking; sequences
What are the features? No good theoretical understanding here.
Programming systems that have multiple classifiers
Classification is Well Understood
Page 8
Comprehension
1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now.
A process that maintains and updates a collection of propositions about the state of affairs.
(ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in 1925. Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous.
Page 9
This Talk
Integrating Learning and Inference Historical Perspective Role of Learning
Global Inference over Classifiers
Semantic Parsing
Multiple levels of processing
Textual Entailment
Summary and Future Directions
Page 10
Learning, Knowledge Representation, Reasoning
There has been a lot of work on these three topics in AI
But, mostly, work on Learning, and work on Knowledge Representation and Reasoning
Very little has been done on an integrative framework.
Inductive Logic Programming Some work from the perspective of probabilistic AI Some work from the perspective of Machine Learning
(EBL) Recent: Valiant’s Robust Logic Learning to Reason
Page 19
An unified framework to study Learning, Knowledge Representation and Reasoning
The goal is to Reason (deduction; abduction - best explanation)
Reasoning is not done from a static Knowledge Base but rather done with knowledge that is learned via interaction with the world.
Intermediate Representation is important – but only to the extent that it is learnable, and it facilitates reasoning.
Feedback to learning is given by the reasoning stage.
There may not be a need (or even a possibility) to learn the intermediate representation exactly, but only to the extent that is supports Reasoning.
[Khardon & Roth JACM97, AAAI94; Roth95, Roth96, Khardon&Roth99Learning to Plan: Khardon’99]
Learning to Reason [’94-’97]: Relevant?
How?
Page 20
Comprehension
1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now.
A process that maintains and updates a collection of propositions about the state of affairs.
(ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in 1925. Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous.
Page 21
Learning serves to support abstraction A context sensitive operation that is done at multiple
levels From names (Mr. Robin, Christopher Robin) to Relations
(wrote, author) and concepts
Learning serves to generate the vocabulary over which reasoning is possible (Part-of-speech; a
subject-of,…) Knowledge Acquisition; tuning; memory,…. Learning in the context of a reasoning system
Training with an Inference Mechanism Feedback is done at the inference level, not at the single
classifier level Labeling using Inference
It’s an Inference Problem
Today’s Research IssuesWhat is the role of Learning?
Page 22
This Talk
Integrating Learning and Inference Historical Perspective Role of Learning
Global Inference over Classifiers
Semantic Parsing
Multiple levels of processing
Textual Entailment
Summary and Future Directions
Page 23
A unified representation that is used as an input to learning processes, and as an output of learning processes ……
Specifically, we use: An abstract representation that is centered around a
semantic parse (predicate-argument representation) augmented by additional information.
Formalized as a hierarchical concept graph (Description Logic inspired)
Feature Description Logic [Cumby&Roth’00, 02, 03]
What I will not talk about: Knowledge Representation
city
person
datemonth(April)year(2001)
country
meeting
participant
location time
name(Iraq)
affiliation
nationality
name(Prague)
organization
Mohammed Atta met with an Iraqi intelligence agent in Prague in April 2001.
Page 24
Inference and Learning Global decisions in which several local decisions play
a role but there are mutual dependencies on their outcome.
Learned classifiers for different sub-problems
Incorporate classifiers’ information, along with constraints, in making coherent decisions – decisions that respect the local classifiers as well as domain & context specific constraints.
Global inference for the best assignment to all variables of interest.
How to induce a predicate argument representation of a sentence.
How to use inference methods over learned outcomes.
How to use declarative information over/along with learned information.
Page 25
Semantic Role Labeling
I left my pearls to my daughter in my will .
[I]A0 left [my pearls]A1 [to my daughter]A2 [in my will]AM-LOC .
A0 Leaver A1 Things left A2 Benefactor AM-LOC Location I left my pearls to my daughter in my will .
Special Case (structure output problem): here, all the data is available at one time; in general, classifiers might be learned from different sources, at different times, at different contexts.
Implications on training paradigms
Overlapping arguments
If A2 is present, A1 must also be
present.
Page 26
Random
Variables Y:
Conditional Distributions P (learned by classifiers) Constraints C– any Boolean function defined on partial assignments (possibly: + weights
W ) Goal: Find the “best” assignment
The assignment that achieves the highest global accuracy.
This is an Integer Programming Problem
Problem Setting
y4 y5 y6 y7 y8
y1 y2 y3C(y1,y4)
C(y2,y3,y6,y7,y8)
Y*=argmaxY PY subject to constraints C(+ WC)
Page 27
A General Inference Setting
Inference as Optimization [Yih&Roth CoNLL’04]
[Punyakanok et. al COLING’04]
[Punyakanok et. al IJCAI’04]
Markov Random Field [standard] Optimization Problem (e.g., Metric Labeling
Problems) [Chekuri et. al’01] Linear Programming Problems
An Integer linear programming (ILP) formulation General: works on non-sequential constraint structure Expressive: can represent many types of declarative constraints Optimal: finds the optimal solution Fast: commercial packages are able to quickly solve very large
problems (hundreds of variables and constraints; sparsity is important)
Page 28
For each verb in a sentence1. Identify all constituents that fill a semantic role2. Determine their roles
• Core Arguments, e.g., Agent, Patient or Instrument• Their adjuncts, e.g., Locative, Temporal or Manner
I left my pearls to my daughter-in-law in my will.
A0 : leaver
A1 : thing left
A2 : benefactor
AM-LOC
The pearls which I left to my daughter-in-law are fake.
A0 : leaver
A1 : thing left
A2 : benefactor
R-A1
The pearls, I said, were left to my daughter-in-law.
A0 : sayer
A1 : utterance
C-A1 : utterance
Who did what to whom, when, where, why,…
Semantic Role Labeling (1/2)
Page 29
PropBank [Palmer et. al. 05] provides a large human-annotated corpus of semantic verb-argument relations. It adds a layer of generic semantic labels to Penn Tree
Bank II. (Almost) all the labels are on the constituents of the
parse trees.
Core arguments: A0-A5 and AA different semantics for each verb specified in the PropBank Frame files
13 types of adjuncts labeled as AM-arg where arg specifies the adjunct type
Semantic Role Labeling (2/2)
Page 30
Algorithmic Approach
Identify argument candidates Pruning [Xue&Palmer, EMNLP’04] Argument Identifier
Binary classification (SNoW)
Classify argument candidates Argument Classifier
Multi-class classification (SNoW)
Inference Use the estimated probability
distribution given by the argument classifier
Use structural and linguistic constraints
Infer the optimal global output
I left my nice pearls to her
I left my nice pearls to her[ [ [ [ [ ] ] ] ] ]
I left my nice pearls to her[ [ [ [ [ ] ] ] ] ]
I left my nice pearls to her
I left my nice pearls to her
Identify Vocabulary
Inference over (old and new)
Vocabulary
candidate arguments
Page 31
Argument Identification & Classification
Both argument identifier and argument
classifier are trained phrase-based classifiers.
Features (some examples) voice, phrase type, head word, path, chunk,
chunk pattern, etc. [some make use of a full syntactic parse]
Learning Algorithm – SNoW Sparse network of linear functions
weights learned by regularized Winnow multiplicative update rule
Probability conversion is done via softmax pi = exp{acti}/j exp{actj}
I left my nice pearls to her[ [ [ [ [ ] ] ] ] ]
I left my nice pearls to her
Page 32
InferenceI left my nice pearls to her
The output of the argument classifier often violates some constraints, especially when the sentence is long.
Finding the best legitimate output is formalized as an optimization problem and solved via Integer Linear Programming. [Punyakanok et. al 04, Roth & Yih 04]
Input: The probability estimation (by the argument classifier)
Structural and linguistic constraints
Allows incorporating expressive (non-sequential) constraints on the variables (the arguments types).
Page 33
Integer Linear Programming (ILP)
Maximize:
Subject to
Page 34
Integer Linear Programming Inference
For each argument ai
Set up a Boolean variable: ai,t indicating whether ai is classified as t
Goal is to maximize i score(ai = t ) ai,t
Subject to the (linear) constraints Any Boolean constraints can be encoded as linear
constraint(s).
If score(ai = t ) = P(ai = t ), the objective is to find the assignment that maximizes the expected number of arguments that are correct and satisfies the constraints.
Page 35
Maximize expected number correct T* = argmaxT i P( ai = ti )
Subject to some constraints Structural and Linguistic (R-A1A1)
Solved with Integer Learning Programming
0.3 0.2 0.2 0.3
0.6 0.0 0.0 0.4
0.1 0.3 0.5 0.1
0.1 0.2 0.3 0.4
I left my nice pearls to her
I left my nice pearls to her
Cost = 0.3 + 0.4 + 0.5 + 0.4 = 1.6 Non-OverlappingCost = 0.3 + 0.4 + 0.3 + 0.4 = 1.4 BlueRed & N-OCost = 0.3 + 0.6 + 0.5 + 0.4 = 1.8 Independent Max
Inference
Page 36
No duplicate argument classes
a POTARG x{a = A0} 1
R-ARG
a2 POTARG , a POTARG x{a = A0} x{a2 = R-A0}
C-ARG a2 POTARG ,
(a POTARG) (a is before a2 ) x{a = A0} x{a2 = C-A0}
Many other possible constraints: Unique labels No overlapping or embedding Relations between number of arguments If verb is of type A, no argument of type B
Any Boolean rule can be encoded as a linear constraint.
If there is an R-ARG phrase, there is an ARG Phrase
If there is an C-ARG phrase, there is an ARG before it
Constraints
Universally quantified rules
Page 37
This approach produces a very good semantic parser. Top ranked system in CoNLL’05 shared task:
Key difference is the Inference Easy and fast: ~1 Sentence/Second (using Xpress-MP) A lot of room for improvement (additional constraints) Demo available http://L2R.cs.uiuc.edu/~cogcomp
Significant also in enabling knowledge acquisition
So far, shown the use of only declarative (deterministic) constraints.
In fact, this approach can be used both with statistical and declarative constraints.
Semantic Parsing: Summary I
Page 38
ILP as a Unified Algorithmic Scheme
Consider a common model for sequential inference: HMM/CRF Inference in this model is done via the Viterbi Algorithm.
Viterbi is a special case of the Linear Programming based Inference. Viterbi is a shortest path problem, which is a LP, with a
canonical matrix that is totally unimodular. Therefore, you can get integrality constraints for free.
One can now incorporate non-sequential/expressive/declarative constraints by modifying this canonical matrix
The extension reduces to a polynomial scheme under some conditions (e.g., when constraints are sequential, when the solution space does not change, etc.)
Not necessarily increases complexity and very efficient in practice
[Roth&Yih, ICML’05]
y1 y2 y3 y4 y5y
x x1 x2 x3 x4 x5s
ABC
ABC
ABC
ABC
ABC
t
Page 39
An Inference method for the “best explanation”, used here to induce a semantic representation of a sentence. A general Information Integration framework.
Allows expressive constraints Any Boolean rule can be represented by a set of linear
(in)equalities Combining acquired (statistical) constraints with
declarative constraints Start with shortest path matrix and constraints Add new constraints to the basic integer linear program.
Solved using off-the-shelf packages If the additional constraints don’t change the solution, LP is
enough Otherwise, the computational time depends on sparsity; fast
in practice Demo available http://L2R.cs.uiuc.edu/~cogcomp
Integer Linear Programming Inference - Summary
Page 40
This Talk
Integrating Learning and Inference Historical Perspective Role of Learning
Global Inference over Classifiers
Semantic Parsing
Multiple levels of processing
Textual Entailment
Summary and Future Directions
Page 41
Global decisions in which several local decisions play a role but there are mutual dependencies on their outcome.
So far, this was a single stage process.
Learn (acquire a new vocabulary) and
Run inference over it to guarantee the coherency of the outcome.
Is that it? Of course, this isn’t sufficient. The process of learning and
Inference needs to be done in phases.
Inference and Learning
It’s turtles all the way down…
Page 42
Pipelining is a crude approximation; interactions occur across levels and down stream decisions often interact with previous decisions.
Leads to propagation of errors Occasionally, later stage problems are easier but upstream
mistakes will not be corrected. There are good reasons for pipelining decisions Global inference over the outcomes of different levels can be used
to break away from this paradigm. [between pipeline & fully global]
Allows a flexible way to incorporate linguistic and structural constraints.
POS Tagging
Phrases
Semantic Entities
Relations
Vocabulary is generated in phases Left to Right processing of sentences is also a pipeline process
Parsing
WSD Semantic Role Labeling
Raw Data
Pipeline
Page 43
J.V. Oswald was murdered at JFK after his assassin, K. F. Johns…
Identify:
J.V. Oswald was murdered at JFK after his assassin, K. F. Johns…location
personperson
Kill (X, Y)
Identify named entities Identify relations between entities Exploit mutual dependencies between named
entities and relation to yield a coherent global detection. [Roth & Yih, COLING’02;CoNLL’04]
Some knowledge (classifiers) may be known in advanceSome constraints may be
available only at decision time
Entities and Relations: Information Integration
Page 44
This Talk
Integrating Learning and Inference Historical Perspective Role of Learning
Global Inference over Classifiers
Semantic Parsing
Multiple levels of processing
Textual Entailment
Summary and Future Directions
Page 45
Comprehension
1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now.
A process that maintains and updates a collection of propositions about the state of affairs.
(ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in 1925. Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous.
Page 46
Given: Q: Who acquired Overture? Determine: A: Eyeing the huge market potential, currently led by Google, Yahoo took over search company Overture Services Inc last year.
Textual Entailment
Eyeing the huge market potential, currently led by Google, Yahoo took over search company Overture Services Inc. last year
Yahoo acquired Overture
Entails
Subsumed by
Overture is a search company Google is a search company
……….
Google owns Overture
By “semantically entailed” we mean: most people would agree that one sentence implies the
other. Simply – making plausible
inferences
Phrasal verb paraphrasing [Connor&Roth’06]
Entity matching [Li et. al, AAAI’04, NAACL’04]
Semantic Role Labeling
Page 47
Discussing Textual Entailment
Requires an inference process that makes use of a large number of learned (and knowledge-based) operators. A sound approach for determining whether a statement
of interest holds in a given sentence. [Braz et. al, AAAI05]
A pair (Sentence, hypothesis) is transformed into a simpler pair, in an entailment preserving manner.
Constrained Optimization formulation, over a large number of learned operators. Aimed at the best (simplest) mapping between predicate-argument representations.
Inference is purposeful: No canonical representation, but rather reasoning on sentences transformation depends on the hypothesis.
What is shown next is a proof. At any stage, a large number of operators are entertained,
some do not fire, some lead nowhere. This is a path through the optimization process, that leads to
a justifiable (and explainable) answer.
Page 48
Sample Entailment Pair
S:Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally made up their minds to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Offers by individual European governments involved supplies of crude or refined oil products.
Does ‘T’ follow from ‘S’?
Page 49
S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally made up their minds to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Offers by individual European governments involved supplies of crude or refined oil products.
S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Offers by individual European governments involved supplies of crude or refined oil products.
S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally made up their minds to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Offers by individual European governments involved supplies of crude or refined oil products.
OPERATOR 1: Phrasal Verb
Replace phrasal verbswith an equivalent single word verb
Page 50
S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Offers by individual European governments involved supplies of crude or refined oil products.
S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Offers by individual European governments involved supplies of crude or refined oil products.
S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Offers by individual European governments involved supplies of crude or refined oil products.
OPERATOR 2: Coreference ResolutionReplace pronouns/possessive pronouns
with the entity to which they refer
Page 51
S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Offers by individual European governments involved supplies of crude or refined oil products.
S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Offers by individual European governments, involved supplies of crude or refined oil products.
S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Offers by individual European governments involved supplies of crude or refined oil products.
OPERATOR 3: Focus of AttentionRemove segments of a sentence that do not appear to be necessary;
may allow more accurate annotation of remaining words
Page 52
S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Offers by individual European governments involved supplies of crude or refined oil products.
S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Individual European governments offered supplies of crude or refined oil products.
S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Offers by individual European governments involved supplies of crude or refined oil products.
OPERATOR 4: Nominalization PromotionReplace a verb that does not express a useful/meaningful relationship
with a nominalization in one of its arguments
involved
supplies …Offers
by individual …
offered
Individual … supplies …
Requires semantic role labeling (for
noun predicates)
Page 53
S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Individual European governments offered supplies of crude or refined oil products.
S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Individual European governments supplied crude or refined oil products.
S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Individual European governments offered supplies of crude or refined oil products.
OPERATOR 4: Nominalization PromotionReplace a verb that does not express a useful/meaningful relationship
with a nominalization in one of its arguments
offered
Individual … supplies
of crude …
supplied
Individual … crude …
Page 54
S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Individual European governments supplied crude or refined oil products.
OPERATOR 5: Predicate Embedding ResolutionReplace a verb compound where the first verb may indicate modality or negation
with a single verb, marked with negation/modality attribute
decided
U.S. and … release
2 million barrelsa day, of oil …
released
U.S. and … 2 million barrelsa day, of oil … S: U.S. and European governments finally
decided to release 2 million barrels a day, of oil and refined products, from their reserves.
T: Individual European governments supplied crude or refined oil products.
S: U.S. and European governments finally released 2 million barrels a day, of oil and refined products, from their reserves.
T: Individual European governments supplied crude or refined oil products.
‘decided’ (almost) does not change the meaning of the embedded verbBut what if the embedding verb had
been ‘refused’?ENTAILMENT SHOULD NOT BE SUCCEEDED
refused
U.S. and … release
2 million barrelsa day, of oil …
released
U.S. and … 2 million barrelsa day, of oil …
negation
Requires semantic role labeling (for
noun predicates)
Page 55
S: U.S. and European governments finally released 2 million barrels a day, of oil and refined products, from their reserves.
T: Individual European governments supplied crude or refined oil products.
OPERATOR 6: Predicate MatchingSystem matches PREDICATES and their ARGUMENTS
-- accounts for monotonicity, modality, negation, and quantifiers
S: U.S. and European governments finally released 2 million barrels a day, of oil and refined products, from their reserves.
T: Individual European governments supplied crude or refined oil products.
ENTAILMENT SUCCEEDS
Requires lexical
abstraction
Page 56
Discussed a general paradigm for learning and inference in the context of natural language understanding tasks
Did not discuss:
Knowledge Representation How to train?
Key insight – what to learn is driven by global decisions. Luckily: # of components is much smaller than # of
decisions. Emphasis should be on
Learn Locally and make use globally (via global inference) [Punyakanok et. al IJCAI’05]
Ability to make use of domain & constraints to drive supervision
[Klementiev & Roth, ACL’06]
Conclusions
Page 57
Discussed a general paradigm for learning and inference in the context of natural language understanding tasks Incorporate classifiers’ information, along with
expressive constraints, within an inference framework for the best explanation.
We can now incorporate many good old ideas.
Learning allows us to develop the right vocabulary, and supports appropriate abstractions so that we can study natural language understanding as a problem of reasoning.
Room for new research on reasoning patterns in NLP
Conclusions (2)
Page 58
Acknowledgement
Many of my students contributed significantly to this line of workVasin Punyakanok, Scott Yih, Mark SammonsXin Li, Dav Zimak, Rodrigo de salvo Braz,Chad Cumby, Yair Even Zohar, Michael Connor; Kevin Small, Alex Klementiev
Funding ARDA, under the AQUAINT program NSF: ITR IIS-0085836, ITR IIS-0428472; ITR IIS-
0085980 A DOI grant under the Reflex program, DASH Optimization
Page 59
Questions?
Thank you