1 textual entailment: a perspective on applied text understanding ido daganbar-ilan university,...
TRANSCRIPT
1
Textual Entailment:A Perspective on Applied
Text Understanding
Ido Dagan Bar-Ilan University, Israel
Joint works with:Oren Glickman, Idan Szpektor, Roy Bar Haim
Bar Ilan University, IsraelMaayan Geffet Hebrew University, IsraelHristo Tanev, Bernardo Magnini, Alberto Lavelli, Lorenza Romano ITC-irst, ItalyBonaventura Coppola and Milen Kouylekov
University of Trento and ITC-irst, Italy
2
Talk Focus: A Framework for “Applied Semantics”
• The textual entailment task – what and why?• Empirical evaluation – PASCAL RTE Challenge• Problem scope, decomposition and analysis• Different perspective on semantic inference• Probabilistic framework
• Cf. syntax, MT – clear task, methodology and community
3
Natural Language and Meaning
Meaning
Language
Ambiguity
Variability
4
Variability of Semantic Expression
Dow ends up
Dow climbs 255
The Dow Jones Industrial Average closed up 255
Stock market hits a record high
Dow gains 255 pointsAll major stock markets surged
5
Variability Recognition –Major Inference in Applications
Information Retrieval (IR)
Question Answering (QA)
Multi Document Summarization (MDS)
Information Extraction (IE)
6
Typical Application Inference
Overture’s acquisition by Yahoo
Yahoo bought Overture
Question Expected answer formWho bought Overture? >> X bought Overture
• Similar for IE: X buy Y
• Similar for “semantic” IR: t: Overture was bought …
• Summarization (multi-document) – identify redundant info
• MT evaluation (and recent proposals for MT?)
text hypothesized answer
7
KRAQ'05 Workshop - KNOWLEDGE and REASONING for ANSWERING QUESTIONS
(IJCAI-05)
CFP:– Reasoning aspects:
* information fusion, * search criteria expansion models * summarization and intensional answers, * reasoning under uncertainty or with incomplete
knowledge,– Knowledge representation and integration:
* levels of knowledge involved (e.g. ontologies, domain knowledge),
* knowledge extraction models and techniques to optimize response accuracy,
* coherence and integration.
8
Inference for Textual Question Answering Workshop (AAAI-05)
CFP:• abductions, default reasoning, inference with epistemic logic
or description logic • inference methods for QA need to be robust, cover all
ambiguities of language • available knowledge sources that can be used for inference
… but similar needs for other applications – can we address a uniform empirical task?
9
Applied Textual Entailment: Abstract Semantic Variability Inference
• QA: “Where was John Wayne Born?” – Answer: Iowa
Text (t): The birthplace of John Wayne is in Iowa
Hypothesis (h): John Wayne was born in Iowa
inference
10
The Generic Entailment Task
Text (t): The birthplace of John Wayne is in Iowa
Hypothesis (h): John Wayne was born in Iowa
inference
Given the text t, can we infer that h is (most likely) true?
11
Classical Entailment Definition
• Chierchia & McConnell-Ginet (2001):A text t entails a hypothesis h if h is true in every circumstance (possible world) in which t is true
• Strict entailment - doesn't account for some uncertainty allowed in applications
12
“Almost certain” Entailments
t: The technological triumph known as GPS … was incubated in the mind of Ivan Getting.
h: Ivan Getting invented the GPS.
t: According to the Encyclopedia Britannica, Indonesia is the largest archipelagic nation in the world, consisting of 13,670 islands.
h: 13,670 islands make up Indonesia.
13
Textual Entailment ≈ Human Reading Comprehension
• From a children’s English learning book(Sela and Greenberg):
• Reference Text: “…The Bermuda Triangle lies in the Atlantic Ocean, off the coast of Florida. …”
• Hypothesis (True/False?): The Bermuda Triangle is near the United States
???
14
Reading Comprehension QA
By Canadian Broadcasting Corporation
T: The school has turned its one-time metal shop – lost to budget cuts almost two years ago - into a money-making professional fitness club.
Q: When did the metal shop close?
A: Almost two years ago
15
Recognizing Textual Entailment (RTE) Challenge
PASCAL NOE Challenge2004-5
Ido Dagan, Oren glickman Bar-Ilan University, IsraelBernardo Magnini ITC-irst, Trento, Italy
16
Generic Dataset by Application Use
• QA
• IE
• Similar for “semantic” IR: Overture was acquired by Yahoo
• Comparable documents (summarization)
• MT evaluation
• Reading comprehension
• Paraphrase acquisition
17
Some Examples
TEXTHYPOTHESISTASKENTAIL-
MENT
1iTunes software has seen lower sales in Europe.
Strong sales for iTunes in Europe.
IRFalse
2Cavern Club sessions paid the Beatles £15 evenings and £5 lunchtime.
The Beatles perform at Cavern Club at lunchtime.
IRTrue
3
:…a shootout at the Guadalajara airport in May, 1993, that killed Cardinal Juan Jesus Posadas Ocampo and six others.
Cardinal Juan Jesus Posadas Ocampo died in 1993.
QATrue
• 567 development examples, 800 test examples
18
Dataset Characteristics
• Examples selected and annotated manually– Using automatic systems where available
• Balanced True/False split
• True – certain or highly probable entailment– Filtering controversial examples
• Example distribution?
• Mode –explorative rather than competitive
19
Arthur Bernstein Competition
“… Competition, even a piano competition, is legitimate … as long as it is just an anecdotal side effect of the musical culture scene, and doesn’t threat to overtake the center stage”
Haaretz News Paper
Culture Section, April 1st, 2005
20
Submissions
• 17 participating groups– 26 system submissions – Microsoft Research: manual analysis of dataset
at lexical-syntactic matching level
21
Broad Range of System Types
• Knowledge sources and inferences– Direct t-h matching:
• Word overlap / Syntactic tree matching
– Lexical relations:• WordNet & statistical (corpus based)
– Theorem Provers / Logical inference • Adding a fuzzy scoring mechanism
• Supervised / unsupervised learning methods
22
23
Accuracy
0.4
0.5
0.6
MIT
RE
Bar
Ila
nU
NE
DD
ublin
Edi
nbur
gh-
Dub
linS
tanf
ord
UIU
CIR
ST
IRS
TU
NE
DE
dinb
urgh
-A
mst
erda
mS
tanf
ord
LCC
Am
ster
dam
accuracy
0.01 sig
0.05 sig
24
Where are we?
25
What’s next – RTE-2• Organizers:
– Bar Ilan, CELCT (Trento), MITRE, MS-Research
• Main dataset: utilizing real systems outputs – QA, IE, IR, summarization
• Human performance dataset– Reading comprehension, human QA (planned)
• Schedule (RTE website):– October – development set– February – results submission (test set January)– April 10 – PASCAL workshop in Venice!
• right after EACL
26
Other Evaluation Modes
• Entailment subtasks evaluations– Lexical, lexical-syntactic, alignment…
• “Seek” mode:– Input: h and corpus– Output: All entailing t’s in corpus– Captures nicely information seeking needs, but
requires post-run annotation (like TREC)
• Contribution to specific applications
27
Empirical Modeling of Meaning Equivalence and Entailment
ACL-05 Workshop
Roy Bar-Haim Idan Szpektor
Oren GlickmanBar-Ilan University
Decomposition ofEntailment Levels
28
Why?
• Entailment Modeling is Complex!!– Was apparent at RTE1
• How can we decompose it, for– Better analysis and sub-task modeling– Piecewise evaluation
• Avoid “this is the performance of my complex system…” methodology
29
Combination of Inference Types
TThe oddest thing about the UAE is that only 500,000 of the 2 million people living in the country are UAE citizens.
HThe population of the United Arab Emirates is 2 million.
T H
30
Combination of Inference Types
The oddest thing about the UAE is that only 500,000 of the 2 million people living in the country are UAE citizens.
The oddest thing about the UAE is that only 500,000 of the 2 million people living in the UAE are UAE citizens.
2 million people live in UAE.
The population of the UAE is 2 million.
The population of the United Arab Emirates is 2 million
Co-reference
Syntactic trans.
paraphrasing
Lexical world knowledge
Diverse inference types, different levels of representation
T
H
31
Defining Intermediate Models
• Lexical
• Lexical-syntactic
32
Lexical Model• T and H are represented as bag of terms
• T L H if
– for each term u H there exists a term v T such that v L u
• v L u if
– they share the same lemma and POS
OR– they are connected by a chain of lexical
transformations
33
Lexical Transformations
• We assume perfect word sense disambiguation
Morphological derivations
acquisition acquire
terrorist terror
Ontological relations
Synonyms (buy acquire)
Hypernyms (produce make)
Meronym (executive company)
Lexical world knowledge
Bill Gates Microsoft’s founder
kill die
34
Lexical Entailment - Examples• #1952 from RTE1 (TH)
TTLLHH
TCrude oil prices soared to record levels
HCrude oil prices rise
?
35
Lexical Entailment - Examples• #1361 from RTE1 (TH)
TCrude oil prices soared to record levels
HCrude oil prices rise.
36
Lexical Entailment - Examples• #1361 from RTE1 (TH)
TCrude oil prices soared to record levels
HCrude oil prices rise
Synonym
37
Lexical Entailment - Examples• #1952 from RTE1 (TH)
TCrude oil prices soared to record levels
HCrude oil prices rise
Synonym
TTLLH H
38
Lexical Entailment - Examples• #2127 from RTE1 (TH)
TA coyote was shot after biting girl in park
HA girl was shot in a park
TTLLHH?
39
Lexical Entailment - Examples• #2127 from RTE1 (TH)
T A coyote was shot after biting girl in Vanier Park
Hgirl was shot in a park A
TTLLH H
40
Lexical-Syntactic Model• T and H are represented by syntactic
dependency relations
• T LS H if the relations within H can be matched by the relations in T
• The coverage can be obtained through a sequence of lexical-syntactic transformations
41
Lexical-Syntactic Transformations
• We assume perfect disambiguation and reference resolution
Lexical•Synonyms, hypernyms, etc. (as before)
Syntactic •Active/Passive
•Apposition
do not change lexical elements
Lexical-synt. Entailment Paraphrases
•X take in Y Y join X
•X is Y man by birth X was born in Y
change both lexical elements and structure
Co-referenceThe country UAE
42
Lexical-Syntactic Entailment - Examples
• #1361 from RTE1 (TH)
TCrude oil prices soared to record levels
HCrude oil prices rise
subj
subj
TTLSLSH H
43
Lexical-Syntactic Entailment - Examples
• #2127 from RTE1 (TH)
T A Coyote was shot after biting girl in Vanier Park
H A girl was shot in a park
TTLSLSH H
subj
subj
44
Beyond Lexical-Syntactic Models TThe SPD got just 21.5% of the vote in the European
Parliament elections, while the conservative opposition parties polled 44.5%
HThe SPD was defeated by the opposition parties.
• Future work…
45
Empirical Analysis
46
Annotation
• 240 T-H pairs of RTE1 dataset• T L H ; T LS H
• High annotator agreement (authors)
Entailment ModelAgreementKappa
Lexical89.6%0.78
Lexical-Syntactic88.8%0.73
• Kappa: “substantial agreement”
47
Model evaluation results
• Low precision for Lexical model Lexical match fails to predict entailment
• High precision for Lexical Syntactic model Checking syntactic relations is crucial
• Medium recall for both levels Higher levels of inference are missing
ModelRecallPrecisionF1
Lexical44%59%0.50
Lexical Syntactic50%86%0.63
48
contribution of individual components RTE 1 positive examples
Inference typefR%
Synonym1914%16%
Morphological1610%14%
Lexical world knowledge128%10%
Hypernym74%6%
Meronym11%1%
Entailment paraphrases3726%31%
Syntactic Transformations2217%19%
Co-reference105%8%
Lexi
cal
Lex-
Syn
49
Summary (1)
• Annotating and analaysing entailment components
• Guide research on entailment
• Opens new research problems and redirects old ones
50
Summary (2)
• Allows better evaluation of systems– Performance of individual components
• Future work – expand analysis to additional levels of representation and inferences– Identify the exciting semantic phenomena …
51
A Different Perspective on Semantic Inference
52
Text Mapping vs. Interpretation
• Focus on the entailment relation as a (directed) mapping between language expressions– Identify the contextual constraints for mappings
• Vs. interpret language into meaning representations (explicitly stipulated senses, logical form, etc.)– Can still be a mean, rather than goal
• How far (faster) can we get?– Cf. MT – direct, transfer, interlingua
53
Making sense of (implicit) senses
• What is the RIGHT set of senses?– Any concrete set is problematic/subjective– … but WSD forces you to choose one
• A lexical entailment perspective:– Instead of identifying an explicitly stipulated
sense of a word occurrence …– identify whether a word occurrence (i.e. its
implicit sense) entails another word occurrence, in context
54
That’s what applications need• Lexical matching: recognize sense equivalence
T1: IKEA announced a new comfort chair
Q: announcement of new models of chairs
T2: MIT announced a new CS chair position
T1: IKEA announced a new comfort chair
Q: announcement of new models of furniture
T2: MIT announced a new CS chair position
• Lexical expansion: Recognize sense entailment
55
Bottom Line
• Address semantic inference as text mapping, rather than interpretation
• From applications perspective - interpretation may be a mean, not the goal– we shouldn’t create artificial problems, which
might be harder than those we need to solve
56
Probabilistic Framework forTextual Entailment
Oren Glickman, Ido Dagan,Moshe Koppel and Jacob Goldberger
Bar Ilan UniversityACL-05 Workshop, AAAI-05
57
Motivation
• Approach entailment uncertainty by principled probabilistic models– Following success of statistical MT, parsing,
language modeling etc.– Integrating inferences and knowledge sources– Vs. ad-hoc scoring
• Need to define concrete probability space– Generative model
58
Notation
• t -- a text (t T)
• h -- a hypothesis (h H)– propositional statements which can be assigned
a truth value
• w: H → {true, false} -- a possible world– truth assignment for every hypothesis
59
A Generative Model
We assume a probabilistic generative model:– generation event of <t,w>:
a text along with a (hidden) possible world
– based on a joint probability distribution
John was born in France
(t)
John Speaks French 1John was born in Paris 1John likes fois gras 0John is married to Alice 1
…(w)
Hidden Possible World
(w)
60
Probabilities
• For a given text t and hypothesis h, we consider the following probabilities:– P(Trh=1)
• Probability that h is assigned a truth value of 1 in a generated <t,w> pair
– P(Trh=1| t) • Probability that h is assigned a truth value of 1 given
that the corresponding text is t
61
Probabilistic Textual Entailment
Definition: • t probabilistically entails h if:
– P(Trh = 1| t) > P(Trh = 1)• t increases the likelihood of h being true • Positive PMI – t provides information on h’s truth
• P(Trh = 1| t): entailment confidence– The relevant entailment score for applications– In practice: high confidence required
62
Setting Properties (1)
• Logical vs. Textual Entailment– Logical entailment: proposition proposition– Textual entailment: text text
• Conditioning on generation of texts rather than on propositional values– David’s father was born in Italy David was born in Italy
• Possible ambiguities of the texts are taken into account– Play baseball with a bat play baseball with an animal
63
Setting Properties (2)
• We do not distinguish between inferences that are based on – language semantics: e.g. murdering killing– vs. domain or world knowledge:
• e.g. live in Paris live in France
• Setting accounts for all causes of uncertainty
64
Setting Properties (3)
• for a given text t and hypothesis h
h P(Trh=1|t) ≠ 1
• But rather:
–P(Trh=1|t) + P(Trh=0 |t) = 1
• Vs. generative language models (cf. speech, MT, LM for IR)
65
Having a probability space
• we can now define concrete probabilistic models for various entailment phenomena
66
Initial Lexical Models• Alignment-based (ACL-05 Workshop)
– The probability that a term in h is entailed by a particular term in t
• Bayesian classification (AAAI-05)– The probability that a term in h is entailed by
(fits in) the entire text of t– An unsupervised text categorization setting
(with EM) – each term is a category
• Demonstrate directions for probabilistic modeling and unsupervised estimation
67
Additional Work:Acquiring Entailment Relations
• Lexical (Geffet and Dagan, 2004/2005)– A clear goal for distributional similarity
– Obtain characteristic features via bootstrapping
– Test characteristic feature inclusion (vs. overlap)
• Lexical Syntactic – TEASE (Szpektor et al. 2004)– Deduce entailment from joint anchor sets
– Initial prospects for unsupervised IE
• Next: obtain probabilities for these entailment “rules”
68
Conclusions: Textual entailment…
• Provides a framework for semantic inference– Application-independent abstraction– Text mapping rather than interpretation
• Raises interesting problems to work on• Amenable for empirical evaluation and
decomposition• May be modeled in principled probabilistic
terms
Thank you!
69
Textual Entailment ReferencesWorkshops · PASCAL Challenges Workshop for Recognizing Textual Entailment, 2005
http://www.cs.biu.ac.il/~glikmao/rte05/index.htmlNote: see 2nd RTE Challenge at http://www.cs.biu.ac.il/~barhair/RTE2/
· ACL 2005 Workshop on Empirical Modeling of Semantic Equivalence and Entailment, 2005http://acl.ldc.upenn.edu/W/W05/#W05-1200
Papers from recent conferences and workshopsJ. Bos & K. Markert. 2005. Recognising Textual Entailment with Logical Inference. Proceedings
of EMNLP 2005.R. Braz, R. Girju, V. Punyakanok, D. Roth, and M. Sammons. 2005. An Inference Model for
Semantic Entailment in Natural Language. Twentieth National Conference on Artificial Intelligence (AAAI-05)
R. Braz, R. Girju, V. Punyakanok, D. Roth, and M. Sammons. 2005. Knowledge Representation for Semantic Entailment and Question-Answering. IJCAI-05 Workshop on Knowledge and Reasoning for Answering Questions.
C. Corley, A. Csomai and R. Mihalcea. Text Semantic Similarity, with Applications. RANLP-05.I. Dagan and O. Glickman. 2004. Probabilistic textual entailment: Generic applied modeling of
language variability. In PASCAL Workshop on Learning Methods for Text Understanding and Mining, Grenoble.
70
Textual Entailment References (2)M. Geffet and I. Dagan. Feature Vector Quality and Distributional Similarity. Proceedings of
The 20th International Conference on Computational Linguistics (COLING), 2004.M. Geffet and I. Dagan. 2005. "The Distributional Inclusion Hypotheses and Lexical
Entailment", ACL 2005, Michigan, USA. O. Glickman, I. Dagan and M. Koppel. 2005. A Probabilistic Classification Approach for
Lexical Textual Entailment, Twentieth National Conference on Artificial Intelligence (AAAI-05)
A. Haghighi, A. Y. Ng, and C. D. Manning. 2005. Robust Textual Inference via Graph Matching. HLT-EMNLP 2005.
M. Kouylekov and B. Magnini. 2005. Tree Edit Distance for Textual Entailment. RANLP 2005.R. Raina, A. Y. Ng, and C. Manning. 2005. Robust textual inference via learning and abductive
reasoning. Twentieth National Conference on Artificial Intelligence (AAAI-05) V. Rus, A. Graesser and K. Desai. 2005. Lexico-Syntactic Subsumption for Textual Entailment.
RANLP 2005.M. Tatu and D. Moldovan. 2005. A Semantic Approach to Recognizing Textual Entailment.
HLT-EMNLP 2005.We would be glad to receive more references on textual entailment. Please send them to