1 textual entailment: a perspective on applied text understanding ido daganbar-ilan university,...

1

Textual Entailment:A Perspective on Applied

Text Understanding

Ido Dagan Bar-Ilan University, Israel

Joint works with:Oren Glickman, Idan Szpektor, Roy Bar Haim

Bar Ilan University, IsraelMaayan Geffet Hebrew University, IsraelHristo Tanev, Bernardo Magnini, Alberto Lavelli, Lorenza Romano ITC-irst, ItalyBonaventura Coppola and Milen Kouylekov

University of Trento and ITC-irst, Italy

2

Talk Focus: A Framework for “Applied Semantics”

• The textual entailment task – what and why?• Empirical evaluation – PASCAL RTE Challenge• Problem scope, decomposition and analysis• Different perspective on semantic inference• Probabilistic framework

• Cf. syntax, MT – clear task, methodology and community

3

Natural Language and Meaning

Meaning

Language

Ambiguity

Variability

4

Variability of Semantic Expression

Dow ends up

Dow climbs 255

The Dow Jones Industrial Average closed up 255

Stock market hits a record high

Dow gains 255 pointsAll major stock markets surged

5

Variability Recognition –Major Inference in Applications

Information Retrieval (IR)

Question Answering (QA)

Multi Document Summarization (MDS)

Information Extraction (IE)

6

Typical Application Inference

Overture’s acquisition by Yahoo

Yahoo bought Overture

Question Expected answer formWho bought Overture? >> X bought Overture

• Similar for IE: X buy Y

• Similar for “semantic” IR: t: Overture was bought …

• Summarization (multi-document) – identify redundant info

• MT evaluation (and recent proposals for MT?)

text hypothesized answer

7

KRAQ'05 Workshop - KNOWLEDGE and REASONING for ANSWERING QUESTIONS

(IJCAI-05)

CFP:– Reasoning aspects:

* information fusion, * search criteria expansion models * summarization and intensional answers, * reasoning under uncertainty or with incomplete

knowledge,– Knowledge representation and integration:

* levels of knowledge involved (e.g. ontologies, domain knowledge),

* knowledge extraction models and techniques to optimize response accuracy,

* coherence and integration.

8

Inference for Textual Question Answering Workshop (AAAI-05)

CFP:• abductions, default reasoning, inference with epistemic logic

or description logic • inference methods for QA need to be robust, cover all

ambiguities of language • available knowledge sources that can be used for inference

… but similar needs for other applications – can we address a uniform empirical task?

9

Applied Textual Entailment: Abstract Semantic Variability Inference

• QA: “Where was John Wayne Born?” – Answer: Iowa

Text (t): The birthplace of John Wayne is in Iowa

Hypothesis (h): John Wayne was born in Iowa

inference

10

The Generic Entailment Task

Text (t): The birthplace of John Wayne is in Iowa

Hypothesis (h): John Wayne was born in Iowa

inference

Given the text t, can we infer that h is (most likely) true?

11

Classical Entailment Definition

• Chierchia & McConnell-Ginet (2001):A text t entails a hypothesis h if h is true in every circumstance (possible world) in which t is true

• Strict entailment - doesn't account for some uncertainty allowed in applications

12

“Almost certain” Entailments

t: The technological triumph known as GPS … was incubated in the mind of Ivan Getting.

h: Ivan Getting invented the GPS.

t: According to the Encyclopedia Britannica, Indonesia is the largest archipelagic nation in the world, consisting of 13,670 islands.

h: 13,670 islands make up Indonesia.

13

Textual Entailment ≈ Human Reading Comprehension

• From a children’s English learning book(Sela and Greenberg):

• Reference Text: “…The Bermuda Triangle lies in the Atlantic Ocean, off the coast of Florida. …”

• Hypothesis (True/False?): The Bermuda Triangle is near the United States

???

14

Reading Comprehension QA

By Canadian Broadcasting Corporation

T: The school has turned its one-time metal shop – lost to budget cuts almost two years ago - into a money-making professional fitness club.

Q: When did the metal shop close?

A: Almost two years ago

15

Recognizing Textual Entailment (RTE) Challenge

PASCAL NOE Challenge2004-5

Ido Dagan, Oren glickman Bar-Ilan University, IsraelBernardo Magnini ITC-irst, Trento, Italy

16

Generic Dataset by Application Use

• QA

• IE

• Similar for “semantic” IR: Overture was acquired by Yahoo

• Comparable documents (summarization)

• MT evaluation

• Reading comprehension

• Paraphrase acquisition

17

Some Examples

TEXTHYPOTHESISTASKENTAIL-

MENT

1iTunes software has seen lower sales in Europe.

Strong sales for iTunes in Europe.

IRFalse

2Cavern Club sessions paid the Beatles £15 evenings and £5 lunchtime.

The Beatles perform at Cavern Club at lunchtime.

IRTrue

3

:…a shootout at the Guadalajara airport in May, 1993, that killed Cardinal Juan Jesus Posadas Ocampo and six others.

Cardinal Juan Jesus Posadas Ocampo died in 1993.

QATrue

• 567 development examples, 800 test examples

18

Dataset Characteristics

• Examples selected and annotated manually– Using automatic systems where available

• Balanced True/False split

• True – certain or highly probable entailment– Filtering controversial examples

• Example distribution?

• Mode –explorative rather than competitive

19

Arthur Bernstein Competition

“… Competition, even a piano competition, is legitimate … as long as it is just an anecdotal side effect of the musical culture scene, and doesn’t threat to overtake the center stage”

Haaretz News Paper

Culture Section, April 1st, 2005

20

Submissions

• 17 participating groups– 26 system submissions – Microsoft Research: manual analysis of dataset

at lexical-syntactic matching level

21

Broad Range of System Types

• Knowledge sources and inferences– Direct t-h matching:

• Word overlap / Syntactic tree matching

– Lexical relations:• WordNet & statistical (corpus based)

– Theorem Provers / Logical inference • Adding a fuzzy scoring mechanism

• Supervised / unsupervised learning methods

23

Accuracy

0.4

0.5

0.6

MIT

RE

Bar

Ila

nU

NE

DD

ublin

Edi

nbur

gh-

Dub

linS

tanf

ord

UIU

CIR

ST

IRS

TU

NE

DE

dinb

urgh

-A

mst

erda

mS

tanf

ord

LCC

Am

ster

dam

accuracy

0.01 sig

0.05 sig

24

Where are we?

25

What’s next – RTE-2• Organizers:

– Bar Ilan, CELCT (Trento), MITRE, MS-Research

• Main dataset: utilizing real systems outputs – QA, IE, IR, summarization

• Human performance dataset– Reading comprehension, human QA (planned)

• Schedule (RTE website):– October – development set– February – results submission (test set January)– April 10 – PASCAL workshop in Venice!

• right after EACL

26

Other Evaluation Modes

• Entailment subtasks evaluations– Lexical, lexical-syntactic, alignment…

• “Seek” mode:– Input: h and corpus– Output: All entailing t’s in corpus– Captures nicely information seeking needs, but

requires post-run annotation (like TREC)

• Contribution to specific applications

27

Empirical Modeling of Meaning Equivalence and Entailment

ACL-05 Workshop

Roy Bar-Haim Idan Szpektor

Oren GlickmanBar-Ilan University

Decomposition ofEntailment Levels

28

Why?

• Entailment Modeling is Complex!!– Was apparent at RTE1

• How can we decompose it, for– Better analysis and sub-task modeling– Piecewise evaluation

• Avoid “this is the performance of my complex system…” methodology

29

Combination of Inference Types

TThe oddest thing about the UAE is that only 500,000 of the 2 million people living in the country are UAE citizens.

HThe population of the United Arab Emirates is 2 million.

T H

30

Combination of Inference Types

The oddest thing about the UAE is that only 500,000 of the 2 million people living in the country are UAE citizens.

The oddest thing about the UAE is that only 500,000 of the 2 million people living in the UAE are UAE citizens.

2 million people live in UAE.

The population of the UAE is 2 million.

The population of the United Arab Emirates is 2 million

Co-reference

Syntactic trans.

paraphrasing

Lexical world knowledge

Diverse inference types, different levels of representation

T

H

31

Defining Intermediate Models

• Lexical

• Lexical-syntactic

32

Lexical Model• T and H are represented as bag of terms

• T L H if

– for each term u H there exists a term v T such that v L u

• v L u if

– they share the same lemma and POS

OR– they are connected by a chain of lexical

transformations

33

Lexical Transformations

• We assume perfect word sense disambiguation

Morphological derivations

acquisition acquire

terrorist terror

Ontological relations

Synonyms (buy acquire)

Hypernyms (produce make)

Meronym (executive company)

Lexical world knowledge

Bill Gates Microsoft’s founder

kill die

34

Lexical Entailment - Examples• #1952 from RTE1 (TH)

TTLLHH

TCrude oil prices soared to record levels

HCrude oil prices rise

?

35



HCrude oil prices rise.

36




Synonym

37




Synonym

TTLLH H

38


TA coyote was shot after biting girl in park

HA girl was shot in a park

TTLLHH?

39


T A coyote was shot after biting girl in Vanier Park

Hgirl was shot in a park A

TTLLH H

40

Lexical-Syntactic Model• T and H are represented by syntactic

dependency relations

• T LS H if the relations within H can be matched by the relations in T

• The coverage can be obtained through a sequence of lexical-syntactic transformations

41

Lexical-Syntactic Transformations

• We assume perfect disambiguation and reference resolution

Lexical•Synonyms, hypernyms, etc. (as before)

Syntactic •Active/Passive

•Apposition

do not change lexical elements

Lexical-synt. Entailment Paraphrases

•X take in Y Y join X

•X is Y man by birth X was born in Y

change both lexical elements and structure

Co-referenceThe country UAE

42

Lexical-Syntactic Entailment - Examples

• #1361 from RTE1 (TH)



subj

subj

TTLSLSH H

43

Lexical-Syntactic Entailment - Examples

• #2127 from RTE1 (TH)

T A Coyote was shot after biting girl in Vanier Park

H A girl was shot in a park

TTLSLSH H

subj

subj

44

Beyond Lexical-Syntactic Models TThe SPD got just 21.5% of the vote in the European

Parliament elections, while the conservative opposition parties polled 44.5%

HThe SPD was defeated by the opposition parties.

• Future work…

45

Empirical Analysis

46

Annotation

• 240 T-H pairs of RTE1 dataset• T L H ; T LS H

• High annotator agreement (authors)

Entailment ModelAgreementKappa

Lexical89.6%0.78

Lexical-Syntactic88.8%0.73

• Kappa: “substantial agreement”

47

Model evaluation results

• Low precision for Lexical model Lexical match fails to predict entailment

• High precision for Lexical Syntactic model Checking syntactic relations is crucial

• Medium recall for both levels Higher levels of inference are missing

ModelRecallPrecisionF1

Lexical44%59%0.50

Lexical Syntactic50%86%0.63

48

contribution of individual components RTE 1 positive examples

Inference typefR%

Synonym1914%16%

Morphological1610%14%

Lexical world knowledge128%10%

Hypernym74%6%

Meronym11%1%

Entailment paraphrases3726%31%

Syntactic Transformations2217%19%

Co-reference105%8%

Lexi

cal

Lex-

Syn

49

Summary (1)

• Annotating and analaysing entailment components

• Guide research on entailment

• Opens new research problems and redirects old ones

50

Summary (2)

• Allows better evaluation of systems– Performance of individual components

• Future work – expand analysis to additional levels of representation and inferences– Identify the exciting semantic phenomena …

51

A Different Perspective on Semantic Inference

52

Text Mapping vs. Interpretation

• Focus on the entailment relation as a (directed) mapping between language expressions– Identify the contextual constraints for mappings

• Vs. interpret language into meaning representations (explicitly stipulated senses, logical form, etc.)– Can still be a mean, rather than goal

• How far (faster) can we get?– Cf. MT – direct, transfer, interlingua

53

Making sense of (implicit) senses

• What is the RIGHT set of senses?– Any concrete set is problematic/subjective– … but WSD forces you to choose one

• A lexical entailment perspective:– Instead of identifying an explicitly stipulated

sense of a word occurrence …– identify whether a word occurrence (i.e. its

implicit sense) entails another word occurrence, in context

54

That’s what applications need• Lexical matching: recognize sense equivalence

T1: IKEA announced a new comfort chair

Q: announcement of new models of chairs

T2: MIT announced a new CS chair position

T1: IKEA announced a new comfort chair

Q: announcement of new models of furniture

T2: MIT announced a new CS chair position

• Lexical expansion: Recognize sense entailment

55

Bottom Line

• Address semantic inference as text mapping, rather than interpretation

• From applications perspective - interpretation may be a mean, not the goal– we shouldn’t create artificial problems, which

might be harder than those we need to solve

56

Probabilistic Framework forTextual Entailment

Oren Glickman, Ido Dagan,Moshe Koppel and Jacob Goldberger

Bar Ilan UniversityACL-05 Workshop, AAAI-05

57

Motivation

• Approach entailment uncertainty by principled probabilistic models– Following success of statistical MT, parsing,

language modeling etc.– Integrating inferences and knowledge sources– Vs. ad-hoc scoring

• Need to define concrete probability space– Generative model

58

Notation

• t -- a text (t T)

• h -- a hypothesis (h H)– propositional statements which can be assigned

a truth value

• w: H → {true, false} -- a possible world– truth assignment for every hypothesis

59

A Generative Model

We assume a probabilistic generative model:– generation event of <t,w>:

a text along with a (hidden) possible world

– based on a joint probability distribution

John was born in France

(t)

John Speaks French 1John was born in Paris 1John likes fois gras 0John is married to Alice 1

…(w)

Hidden Possible World

(w)

60

Probabilities

• For a given text t and hypothesis h, we consider the following probabilities:– P(Trh=1)

• Probability that h is assigned a truth value of 1 in a generated <t,w> pair

– P(Trh=1| t) • Probability that h is assigned a truth value of 1 given

that the corresponding text is t

61

Probabilistic Textual Entailment

Definition: • t probabilistically entails h if:

– P(Trh = 1| t) > P(Trh = 1)• t increases the likelihood of h being true • Positive PMI – t provides information on h’s truth

• P(Trh = 1| t): entailment confidence– The relevant entailment score for applications– In practice: high confidence required

62

Setting Properties (1)

• Logical vs. Textual Entailment– Logical entailment: proposition proposition– Textual entailment: text text

• Conditioning on generation of texts rather than on propositional values– David’s father was born in Italy David was born in Italy

• Possible ambiguities of the texts are taken into account– Play baseball with a bat play baseball with an animal

63


• We do not distinguish between inferences that are based on – language semantics: e.g. murdering killing– vs. domain or world knowledge:

• e.g. live in Paris live in France

• Setting accounts for all causes of uncertainty

64


• for a given text t and hypothesis h

h P(Trh=1|t) ≠ 1

• But rather:

–P(Trh=1|t) + P(Trh=0 |t) = 1

• Vs. generative language models (cf. speech, MT, LM for IR)

65

Having a probability space

• we can now define concrete probabilistic models for various entailment phenomena

66

Initial Lexical Models• Alignment-based (ACL-05 Workshop)

– The probability that a term in h is entailed by a particular term in t

• Bayesian classification (AAAI-05)– The probability that a term in h is entailed by

(fits in) the entire text of t– An unsupervised text categorization setting

(with EM) – each term is a category

• Demonstrate directions for probabilistic modeling and unsupervised estimation

67

Additional Work:Acquiring Entailment Relations

• Lexical (Geffet and Dagan, 2004/2005)– A clear goal for distributional similarity

– Obtain characteristic features via bootstrapping

– Test characteristic feature inclusion (vs. overlap)

• Lexical Syntactic – TEASE (Szpektor et al. 2004)– Deduce entailment from joint anchor sets

– Initial prospects for unsupervised IE

• Next: obtain probabilities for these entailment “rules”

68

Conclusions: Textual entailment…

• Provides a framework for semantic inference– Application-independent abstraction– Text mapping rather than interpretation

• Raises interesting problems to work on• Amenable for empirical evaluation and

decomposition• May be modeled in principled probabilistic

terms

Thank you!

69

Textual Entailment ReferencesWorkshops · PASCAL Challenges Workshop for Recognizing Textual Entailment, 2005

http://www.cs.biu.ac.il/~glikmao/rte05/index.htmlNote: see 2nd RTE Challenge at http://www.cs.biu.ac.il/~barhair/RTE2/

· ACL 2005 Workshop on Empirical Modeling of Semantic Equivalence and Entailment, 2005http://acl.ldc.upenn.edu/W/W05/#W05-1200

Papers from recent conferences and workshopsJ. Bos & K. Markert. 2005. Recognising Textual Entailment with Logical Inference. Proceedings

of EMNLP 2005.R. Braz, R. Girju, V. Punyakanok, D. Roth, and M. Sammons. 2005. An Inference Model for

Semantic Entailment in Natural Language. Twentieth National Conference on Artificial Intelligence (AAAI-05)

R. Braz, R. Girju, V. Punyakanok, D. Roth, and M. Sammons. 2005. Knowledge Representation for Semantic Entailment and Question-Answering. IJCAI-05 Workshop on Knowledge and Reasoning for Answering Questions.

C. Corley, A. Csomai and R. Mihalcea. Text Semantic Similarity, with Applications. RANLP-05.I. Dagan and O. Glickman. 2004. Probabilistic textual entailment: Generic applied modeling of

language variability. In PASCAL Workshop on Learning Methods for Text Understanding and Mining, Grenoble.

70

Textual Entailment References (2)M. Geffet and I. Dagan. Feature Vector Quality and Distributional Similarity. Proceedings of

The 20th International Conference on Computational Linguistics (COLING), 2004.M. Geffet and I. Dagan. 2005. "The Distributional Inclusion Hypotheses and Lexical

Entailment", ACL 2005, Michigan, USA. O. Glickman, I. Dagan and M. Koppel. 2005. A Probabilistic Classification Approach for

Lexical Textual Entailment, Twentieth National Conference on Artificial Intelligence (AAAI-05)

A. Haghighi, A. Y. Ng, and C. D. Manning. 2005. Robust Textual Inference via Graph Matching. HLT-EMNLP 2005.

M. Kouylekov and B. Magnini. 2005. Tree Edit Distance for Textual Entailment. RANLP 2005.R. Raina, A. Y. Ng, and C. Manning. 2005. Robust textual inference via learning and abductive

reasoning. Twentieth National Conference on Artificial Intelligence (AAAI-05) V. Rus, A. Graesser and K. Desai. 2005. Lexico-Syntactic Subsumption for Textual Entailment.

RANLP 2005.M. Tatu and D. Moldovan. 2005. A Semantic Approach to Recognizing Textual Entailment.

HLT-EMNLP 2005.We would be glad to receive more references on textual entailment. Please send them to

[email protected]

1 textual entailment: a perspective on applied text understanding ido daganbar-ilan university,...

Documents

incomplete knowledge

domain knowledge

birthplace of john wayne

levels of knowledge

knowledge representation

iowatext t

iowa hypothesis h

knowledge extraction