aquaint bbn’s aqua project ana licuanan, jonathan may, scott miller, ralph weischedel, jinxi xu 3...

AQUAINT

BBN’s AQUA ProjectBBN’s AQUA Project

Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu

3 December 2002

2

AQUAINTBBN’s Approach to QABBN’s Approach to QA

• Theme: Use document retrieval, entity recognition, & proposition recognition

• Analyze the question

– Reduce question to propositions and a bag of words

– Predict the type of the answer

• Rank candidate answers using passage retrieval from primary corpus (the Aquaint corpus)

• Other knowledge sources (e.g. the Web) are optionally used to rerank answers

• Re-rank candidates based on propositions

• Estimate confidence for answers

3

AQUAINTSystem DiagramSystem Diagram

Question Classification

Web Search

NP Labeling

Treebank

Name Annotation

Name Extraction

Parsing

Description ClassificationProposition Finding

Document Retrieval

Confidence Estimation

Passage Retrieval

Question

Answer & Confidence Score

Name Extraction

Regularization Proposition Bank

AQUAINT

Question ClassificationQuestion Classification

5

AQUAINTQuestion ClassificationQuestion Classification

• A hybrid approach based on rules and statistical parsing & question templates– Match question templates against statistical parses– Back off to statistical bag-of-word classification

• Example features used for classification– The type of WHNP starting the question (e.g. “Who”,

“What”, “When” …) – The headword of the core NP– WordNet definition– Bag of words– Main verb of the question

• Performance– TREC8&9 questions for training– ~85% when testing on TREC10

6

AQUAINTExamples of Question AnalysisExamples of Question Analysis

• Where is the Taj Mahal?

– WHNP=where

– Answer type: Location or GPE

• Which pianist won the last International Tchaikovsky Competition?

– Headword of core NP=pianist,

– WordNet definition=person

– Answer type: Person

7

AQUAINTQuestion-Answer TypesQuestion-Answer Types

Type Subtype

ORGANIZATIONCORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS

LOCATION CONTINENT LAKE_SEA_OCEAN OTHER REGION RIVER BORDER

FAC AIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER

GAME

PRODUCT DRUG OTHER VEHICLE WEAPON

NATIONALITY NATIONALITY OTHER POLITICAL RELIGION

LANGUAGE

FAC_DESC AIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER

MONEY

GPE_DESC CITY COUNTRY OTHER STATE_PROVINCE

ORG_DESCCORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS

CONTACT_INFO ADDRESS OTHER PHONE

WORK_OF_ART BOOK OTHER PAINTING PLAY SONG

*Thanks to USC/ISI and IBM groups for sharing the conclusions of their analyses.

8

AQUAINTQuestion Answer Types (cont’d)Question Answer Types (cont’d)

PRODUCT_DESC OTHER VIHICLE WEAPON

PERSON

EVENT HURRICAN OTHER WAR

SUBSTANCE CHEMICAL DRUG FOOD OTHER

PER_DESC

PRODCUT OTHER

ORDINAL

ANIMAL

QUANTITY1D 1D_SPACE 2D 2D_SPACE 3D 3D_SPACE ENERGY OTHER SPEED WEIGHT TEMPERATURE

GPE CITY COUNTRY OTHER STATE_PROVINCE

DISEASE

CARDINAL

AGE

TIME

PLANT

PERCENT

LAW

DATE AGE DATE DURATION OTHER

9

AQUAINTFrequency of Q TypesFrequency of Q Types

0

50

100

150

200

250P

ers

on

Qu

an

tity

Mo

ne

yP

erc

en

tO

rga

niz

atio

nO

rga

niz

atio

n-D

esc

Pro

du

ct-N

am

eP

rod

uct

-De

scF

aci

lity

Dis

ea

seR

ea

son

GP

EG

PE

-De

scW

ork

-of-

Art

Da

teE

ven

tT

ime

La

ng

ua

ge

Na

tion

alit

yL

oca

tion

-Na

me

De

finiti

on

Use

Oth

er

Ca

rdin

al

Ord

ina

lG

am

eC

on

tact

In

foA

nim

al

Pla

nt

Bio

Ca

use

-Eff

ect

-In

flue

nce

La

w

# i

n T

RE

C 8

, 9

, 1

0

AQUAINT

InterpretationInterpretation

11

AQUAINTIdentiFinderIdentiFinderTMTM Status Status

• Current IdentiFinder performance on types

• IdentiFinder easily trainable for other languages, e.g., Arabic and Chinese

Rec

all

Pre

cis

ion F

SubcategoryCategory

88 89 88.487 88 87.3

0

20

40

60

80

100

12

AQUAINTProposition IndexingProposition Indexing

• A shallow semantic representation

– Deeper than bags of words

– But broad enough to cover all the text

• Characterizes documents by

– The entities they contain

– Propositions involving those entities

• Resolves all references to entities

– Whether named, described, or pronominal

• Represents all propositions that are directly stated in the text

13

AQUAINTProposition Finding ExampleProposition Finding Example

Propositions

• (e1: “Dell”)

• (e2: “Comaq”)

• (e3: “the most PCs”)

• (e4: “2001”)

• (sold subj:e1, obj:e3, in:e4)

• (beating subj:e1, obj:e2)

• Question: Which company sold the most PCs in 2001?

• Text: Dell, beating Compaq, sold the most PCs in 2001.

• Passage retrieval would select the wrong answer

Answer

14

AQUAINTProposition Recognition StrategyProposition Recognition Strategy

• Start with a lexicalized, probabilistic (LPCFG) parsing model

• Distinguish names by replacing NP labels with NPP

• Currently, rules normalize the parse tree to produce propositions

• At a later date, extend the statistical model to

– Predict argument labels for clauses

– Resolve references to entities

15

AQUAINTConfidence EstimationConfidence Estimation

• Compute probability P(correct|Q,A) from the following featuresP(correct|Q,A)P(correct|type(Q), <m,n>, PropSat)– type(Q): question type– m: question length– n: number of matched question words in answer

context– PropSat: whether answer satisfies propositions in the

question• Confidence for answers found on the Web P(correct|Q,A)P(correct|Freq, InTrec)

– Freq=Number of Web hits, using Google– InTrec=Whether Q was also a top answer from Aquaint

corpus

16

AQUAINT

Dependence of Answer Correctness Dependence of Answer Correctness on Question Typeon Question Type

0

0.1

0.2

0.3

0.4

0.5P

(cor

rrec

t|Typ

e)

17

AQUAINT

Dependence on Proposition Dependence on Proposition SatisfactionSatisfaction

0

0.1

0.2

0.3

0.4

0.5

0.6

PropSat=True PropSat=False

P(c

orr

ec

t|P

rop

Sa

t)

18

AQUAINT

Dependence on Number of Matched Dependence on Number of Matched WordsWords

0

0.1

0.2

0.3

0.4

0.5

0 2 4 6

number of matched words

p(co

rrec

t)

questionlength=3

questionlength=4

questionlength=5

19

AQUAINT

Dependence of AnswerDependence of AnswerCorrectness on Web FrequencyCorrectness on Web Frequency

0

0.2

0.4

0.6

0.8

1

0 50 100 150

Fre que ncy of answe r in Google sum m arie s

P(c

orre

ct|F

,IN

TR

EC

)

INT REC t rue

INT REC false

20

AQUAINTOfficial Results of TREC2002QAOfficial Results of TREC2002QA

RunTagUnranked Average

Precision

Ranked Average

Precision

Upper-bound

BBN2002A 0.186 0.257 0.498

BBN2002B 0.288 0.468 0.646

BBN2002C 0.284 0.499 0.641

• BBN2002A did not use Web

• BBN2002B&C used Web

• Unranked average precision=percentage of questions for which the first answer is correct

• Ranked average precision=Confidence weighted score, the official metric for TREC2002

• Upper-bound=confidence weighted score given perfect confidence estimation

21

AQUAINTRecent Progress Recent Progress

• In the last six months, we have:

– Retrained our name tagger (IdentiFinderTM) for roughly 29 question types

– Distributed the re-trained English version of IdentiFinder to other sites

– Participated in the Question Answering track of TREC 2002

– Participated in a pilot evaluation of automatically answering definitional/biographical questions

– Developed a demonstration of our question answering system AQUA against streaming news

aquaint bbn’s aqua project ana licuanan, jonathan may, scott miller, ralph weischedel, jinxi xu 3...

Documents

question lengthn

questionreduce question

question typem

e2 question

questionanswer types

question answer types

statistical bag

statistical parsesback