natural logic for textual inference bill maccartney and christopher d. manning nlp group stanford...

Natural LogicNatural Logicfor Textual Inferencefor Textual Inference

Bill MacCartney and Christopher D. Manning

NLP Group

Stanford University

29 June 2007

2

Inferences involving Inferences involving monotonicitymonotonicity

Few states completely forbid casino gambling.

OK Few western states completely forbid casino gambling.

Few or no states completely forbid casino gambling.

Few states completely forbid gambling.

No Few states completely forbid casino gambling for kids.

Few states or cities completely forbid casino gambling.

Few states restrict gambling.

What kind of textual inference system could predict this?

Introduction • Foundations of Natural Logic • The NatLog System • Experiments with FraCaS • Experiments with RTE • Conclusion

3

Textual inference:Textual inference:a spectrum of approachesa spectrum of approaches

robust,but shallow

deep,but brittle

naturallogic

lexical/semanticoverlap

Jijkoun & de Rijke 2005

patternedrelationextraction

Romano et al. 2006

pred-argstructurematching

Hickl et al. 2006

FOL &theoremproving

Bos & Markert 2006


4

What is natural logic?What is natural logic?

• A logic whose vehicle of inference is natural language• No formal notation: • Just words & phrases: All men are mortal…

• Focus on a ubiquitous category of inference: monotonicity• I.e., reasoning about the consequences of broadening or narrowing the concepts or constraints in a proposition

• Precise, yet sidesteps difficulties of translating to FOL:idioms, intensionality and propositional attitudes, modalities, indexicals,reciprocals,scope ambiguities, quantifiers such as most, reciprocals, anaphoric adjectives, temporal and causal relations, aspect, unselective quantifiers, adverbs of quantification, donkey sentences, generic determiners, …

• Aristotle, Lakoff, van Benthem, Sánchez Valencia 1991


5

OutlineOutline

• Introduction

• Foundations of Natural Logic

• The NatLog System

• Experiments with FraCaS

• Experiments with RTE

• Conclusion


6

The entailment relation: The entailment relation:

In natural logic, entailment is defined as an ordering relation over expressions of all semantic types (not just sentences)

category semantic type example(s)

common nouns et penguin bird

adjectives et tiny small

intransitive verbs

et hover fly

transitive verbs

eet kick strike

temporal &locative modifiers

(et)(et) this morning today in Beijing in China

connectives ttt and or

quantifiers(et)t

(et)(et)teveryone someone all most some


7

Monotonicity of semantic Monotonicity of semantic functionsfunctions

Upward-monotone (M)The default: “bigger” inputs yield “bigger” outputsExample: broken. Since chair furniture, broken chair broken furnitureHeuristic: in a M context, broadening edits preserve truthDownward-monotone (M)Negatives, restrictives, etc.: “bigger” inputs yield “smaller” outputsExample: doesn’t. While hover fly, doesn’t fly doesn’t hoverHeuristic: in a M context, narrowing edits preserve truthNon-monotone (#M)Superlatives, some quantifiers (most, exactly n): neither M nor MExample: most. While penguin bird, most penguins # most birdsHeuristic: in a #M context, no edits preserve truth

In compositional semantics, meanings are seen as functions, and can have various monotonicity properties:


8

Downward monotonicityDownward monotonicity

few athletes few sprinters

restrictive quantifiers:no, few, at most n

prohibit weapons prohibit guns

negative & restrictive verbs:

lack, fail, prohibit, deny

without clothes without pants

prepositions & adverbs:without, except, only

drug ban heroin ban

negative & restrictive nouns:

ban, absence [of], refusal

If stocks rise, we’ll get real paid If stocks soar, we’ll get real paid

the antecedent of a conditional

didn’t dance didn’t tango

explicit negation:no, n’t

Downward-monotone constructions are widespread!


9

Monotonicity of binary Monotonicity of binary functionsfunctions

• Some quantifiers are best viewed as binary functions

• Different arguments can have different monotonicities

all

All ducks fly All mallards fly All ducks move

some

Some mammals fly Some animals fly Some mammals move

no

No dogs fly No poodles fly No dogs hover

not every

Not every bird flies Not every animal flies Not every bird hovers


10

Composition of monotonicityComposition of monotonicity

• Composition of functions composition of monotonicity

• Sánchez Valencia: a precise monotonicity calculus for CG

Few

forbid

states completely

casino

gambling

+ + +–– –Few states completely forbid casino gambling

o

M M #M

M M M #M

M M M #M

#M #M #M #M


11

The NatLog SystemThe NatLog System

linguistic pre-processing

alignment

entailment classification

1

2

3

textual inference problem

prediction


12

Step 1: Linguistic Pre-Step 1: Linguistic Pre-processingprocessing

• Tokenize & parse input sentences (future: & NER & coref & …)

• Identify & project monotonicity operators• Problem: PTB-style parse tree semantic structure!

Few states completely forbid casino gambling

JJ NNS RB VBD NN NN

NP ADVP NP

VP

S

+ + +–– –

• Solution: specify projections in PTB trees using Tregex

Few

forbid

states completely

casino

gambling

fewpattern: JJ < /^[Ff]ew$/arg1: M on dominating NP

__ >+(NP) (NP=proj !> NP)arg2: M on dominating S

__ >+(/.*/) (S=proj !> S)


13

Step 2: AlignmentStep 2: Alignment

• Alignment = a sequence of atomic edits [cf. Harmeling 07]

• Atomic edits over token spans: DEL, INS, SUB, ADV

• Limitations:• no easy way to represent movement• no alignments to non-contiguous sets of tokens

• Benefits:• well-defined sequence of intermediate forms• can use adaptation of Levenshtein string-edit DP

• We haven’t (yet) invested much effort here

Few states completely forbid casino gambling

Few states have completely prohibited gambling

ADV ADV SUB ADVINS DEL


14

Step 3: Entailment Step 3: Entailment ClassificationClassification

• Atomic edits atomic entailment problems

• Feature representation• Basic features: edit type, monotonicity, “light edit” feature• Lexical features for SUB edits: lemma sim, WN features

• Decision tree classifier• Trained on small data set designed to exercise feature space• Outputs an elementary entailment relation: = # |

• Composition of atomic entailment predictions• Fairly intuitive: º , º #, º = =, etc.• Composition yields global entailment prediction for problem


15

predict

featurize

Entailment model exampleEntailment model example

type INSmono downisLighttrue

Few states completely forbid casino gambling .

Few states have completely prohibited gambling .

SUBINS DEL

type SUBmono downisLightfalselemSim 0.375wnSyn 1.0wnAnto 0.0wnHypo 0.0

type DELmono upisLightfalse

compose

= (equivalent) (forward)= (equivalent)

(forward)


16

The FraCaS test suiteThe FraCaS test suite

• FraCaS: mid-90s project in computational semantics

• 346 “textbook” examples of textual inference problems

No delegate finished the report.

Some delegate finished the report on time.

Smith believed that ITEL had won the contract in 1992.

ITEL won the contract in 1992.

• 9 sections: quantifiers, plurals, anaphora, ellipsis, …

• 3 possible answers: yes, no, unknown (not balanced!)

• 55% single-premise, 45% multi-premise (excluded)

unk

no


17

Results on FraCaSResults on FraCaS

§ Category # Acc.

1 Quantifiers 44 84.09

2 Plurals 24 41.67

3 Anaphora 6 50.00

4 Ellipsis 25 28.00

5 Adjectives 15 60.00

6 Comparatives 16 68.75

7 Temporal 36 61.11

8 Verbs 8 62.50

9 Attitudes 9 55.56

“Applicable”: 1, 5, 6

75 76.00

All sections 183 59.56

yes unk no total

yes 62 40 — 102

unk 15 45 — 60

no 6 13 2 21

total

90 91 2 183

guess

gold

by section

confusion matrix


18

The RTE3 test suiteThe RTE3 test suite

• RTE: more “natural” textual inference problems

• Much longer premises: average 35 words (vs. 11)

• Binary classification: yes and no

• RTE problems not ideal for NatLog• Many kinds of inference not addressed by NatLog• Big edit distance propagation of errors from atomic model

• Maybe we can achieve high precision on a subset?

• Strategy: hybridize with broad-coverage RTE system• As in Bos & Markert 2006


19

A hybrid RTE system using A hybrid RTE system using NatLogNatLog

NatLog

pre-processing

alignment

classification

{yes, no}

Stanford

pre-processing

alignment

classification

[–, +]

threshold(balanced)

{yes, no}

x

threshold(optimized)

{yes, no}


20

Results on RTE3Results on RTE3

RTE3 Development Set (800 problems)

System % yesprecision

recallaccuracy

Stanford 50.25 68.66 66.99 67.25

NatLog 18.00 76.39 26.70 58.00

Hybrid, balanced

50.00 69.75 67.72 68.25

Hybrid, optimized

55.13 69.16 74.03 69.63RTE3 Test Set (800 problems)

System % yesprecision

recallaccuracy

Stanford 50.00 61.75 60.24 60.50

NatLog 23.88 68.06 31.71 57.38

Hybrid, balanced

50.00 64.50 62.93 63.25

Hybrid, optimized

54.13 63.74 67.32 63.62

25 extraproblems

(significant,p < 0.01)


21

ConclusionConclusion

Natural logic enables precise reasoning about monotonicity, while sidestepping the difficulties of translating to FOL.

The NatLog system successfully handles a broad range of such inferences, as demonstrated on the FraCaS test suite.

Future work:• Add proof search, to handle multiple-premise inference problems• Consider using CCG parses to facilitate monotonicity projection• Explore the use of more sophisticated alignment models• Bring factive & implicative inferences into the NatLog framework

:-) Thanks! Questions?


natural logic for textual inference bill maccartney and christopher d. manning nlp group stanford...

Documents

fracas experiments

natlog system experiments

forbid gambling

rte conclusion slide

natural language

quantifiers e t t e

forbid casino gambling

western states