aquaint bbn’s aqua project ana licuanan, jonathan may, scott miller, ralph weischedel, jinxi xu 3...
TRANSCRIPT
AQUAINT
BBN’s AQUA ProjectBBN’s AQUA Project
Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu
3 December 2002
2
AQUAINTBBN’s Approach to QABBN’s Approach to QA
• Theme: Use document retrieval, entity recognition, & proposition recognition
• Analyze the question
– Reduce question to propositions and a bag of words
– Predict the type of the answer
• Rank candidate answers using passage retrieval from primary corpus (the Aquaint corpus)
• Other knowledge sources (e.g. the Web) are optionally used to rerank answers
• Re-rank candidates based on propositions
• Estimate confidence for answers
3
AQUAINTSystem DiagramSystem Diagram
Question Classification
Web Search
NP Labeling
Treebank
Name Annotation
Name Extraction
Parsing
Description ClassificationProposition Finding
Document Retrieval
Confidence Estimation
Passage Retrieval
Question
Answer & Confidence Score
Name Extraction
Regularization Proposition Bank
AQUAINT
Question ClassificationQuestion Classification
5
AQUAINTQuestion ClassificationQuestion Classification
• A hybrid approach based on rules and statistical parsing & question templates– Match question templates against statistical parses– Back off to statistical bag-of-word classification
• Example features used for classification– The type of WHNP starting the question (e.g. “Who”,
“What”, “When” …) – The headword of the core NP– WordNet definition– Bag of words– Main verb of the question
• Performance– TREC8&9 questions for training– ~85% when testing on TREC10
6
AQUAINTExamples of Question AnalysisExamples of Question Analysis
• Where is the Taj Mahal?
– WHNP=where
– Answer type: Location or GPE
• Which pianist won the last International Tchaikovsky Competition?
– Headword of core NP=pianist,
– WordNet definition=person
– Answer type: Person
7
AQUAINTQuestion-Answer TypesQuestion-Answer Types
Type Subtype
ORGANIZATIONCORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS
LOCATION CONTINENT LAKE_SEA_OCEAN OTHER REGION RIVER BORDER
FAC AIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER
GAME
PRODUCT DRUG OTHER VEHICLE WEAPON
NATIONALITY NATIONALITY OTHER POLITICAL RELIGION
LANGUAGE
FAC_DESC AIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER
MONEY
GPE_DESC CITY COUNTRY OTHER STATE_PROVINCE
ORG_DESCCORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS
CONTACT_INFO ADDRESS OTHER PHONE
WORK_OF_ART BOOK OTHER PAINTING PLAY SONG
*Thanks to USC/ISI and IBM groups for sharing the conclusions of their analyses.
8
AQUAINTQuestion Answer Types (cont’d)Question Answer Types (cont’d)
PRODUCT_DESC OTHER VIHICLE WEAPON
PERSON
EVENT HURRICAN OTHER WAR
SUBSTANCE CHEMICAL DRUG FOOD OTHER
PER_DESC
PRODCUT OTHER
ORDINAL
ANIMAL
QUANTITY1D 1D_SPACE 2D 2D_SPACE 3D 3D_SPACE ENERGY OTHER SPEED WEIGHT TEMPERATURE
GPE CITY COUNTRY OTHER STATE_PROVINCE
DISEASE
CARDINAL
AGE
TIME
PLANT
PERCENT
LAW
DATE AGE DATE DURATION OTHER
9
AQUAINTFrequency of Q TypesFrequency of Q Types
0
50
100
150
200
250P
ers
on
Qu
an
tity
Mo
ne
yP
erc
en
tO
rga
niz
atio
nO
rga
niz
atio
n-D
esc
Pro
du
ct-N
am
eP
rod
uct
-De
scF
aci
lity
Dis
ea
seR
ea
son
GP
EG
PE
-De
scW
ork
-of-
Art
Da
teE
ven
tT
ime
La
ng
ua
ge
Na
tion
alit
yL
oca
tion
-Na
me
De
finiti
on
Use
Oth
er
Ca
rdin
al
Ord
ina
lG
am
eC
on
tact
In
foA
nim
al
Pla
nt
Bio
Ca
use
-Eff
ect
-In
flue
nce
La
w
# i
n T
RE
C 8
, 9
, 1
0
AQUAINT
InterpretationInterpretation
11
AQUAINTIdentiFinderIdentiFinderTMTM Status Status
• Current IdentiFinder performance on types
• IdentiFinder easily trainable for other languages, e.g., Arabic and Chinese
Rec
all
Pre
cis
ion F
SubcategoryCategory
88 89 88.487 88 87.3
0
20
40
60
80
100
12
AQUAINTProposition IndexingProposition Indexing
• A shallow semantic representation
– Deeper than bags of words
– But broad enough to cover all the text
• Characterizes documents by
– The entities they contain
– Propositions involving those entities
• Resolves all references to entities
– Whether named, described, or pronominal
• Represents all propositions that are directly stated in the text
13
AQUAINTProposition Finding ExampleProposition Finding Example
Propositions
• (e1: “Dell”)
• (e2: “Comaq”)
• (e3: “the most PCs”)
• (e4: “2001”)
• (sold subj:e1, obj:e3, in:e4)
• (beating subj:e1, obj:e2)
• Question: Which company sold the most PCs in 2001?
• Text: Dell, beating Compaq, sold the most PCs in 2001.
• Passage retrieval would select the wrong answer
Answer
14
AQUAINTProposition Recognition StrategyProposition Recognition Strategy
• Start with a lexicalized, probabilistic (LPCFG) parsing model
• Distinguish names by replacing NP labels with NPP
• Currently, rules normalize the parse tree to produce propositions
• At a later date, extend the statistical model to
– Predict argument labels for clauses
– Resolve references to entities
15
AQUAINTConfidence EstimationConfidence Estimation
• Compute probability P(correct|Q,A) from the following featuresP(correct|Q,A)P(correct|type(Q), <m,n>, PropSat)– type(Q): question type– m: question length– n: number of matched question words in answer
context– PropSat: whether answer satisfies propositions in the
question• Confidence for answers found on the Web P(correct|Q,A)P(correct|Freq, InTrec)
– Freq=Number of Web hits, using Google– InTrec=Whether Q was also a top answer from Aquaint
corpus
16
AQUAINT
Dependence of Answer Correctness Dependence of Answer Correctness on Question Typeon Question Type
0
0.1
0.2
0.3
0.4
0.5P
(cor
rrec
t|Typ
e)
17
AQUAINT
Dependence on Proposition Dependence on Proposition SatisfactionSatisfaction
0
0.1
0.2
0.3
0.4
0.5
0.6
PropSat=True PropSat=False
P(c
orr
ec
t|P
rop
Sa
t)
18
AQUAINT
Dependence on Number of Matched Dependence on Number of Matched WordsWords
0
0.1
0.2
0.3
0.4
0.5
0 2 4 6
number of matched words
p(co
rrec
t)
questionlength=3
questionlength=4
questionlength=5
19
AQUAINT
Dependence of AnswerDependence of AnswerCorrectness on Web FrequencyCorrectness on Web Frequency
0
0.2
0.4
0.6
0.8
1
0 50 100 150
Fre que ncy of answe r in Google sum m arie s
P(c
orre
ct|F
,IN
TR
EC
)
INT REC t rue
INT REC false
20
AQUAINTOfficial Results of TREC2002QAOfficial Results of TREC2002QA
RunTagUnranked Average
Precision
Ranked Average
Precision
Upper-bound
BBN2002A 0.186 0.257 0.498
BBN2002B 0.288 0.468 0.646
BBN2002C 0.284 0.499 0.641
• BBN2002A did not use Web
• BBN2002B&C used Web
• Unranked average precision=percentage of questions for which the first answer is correct
• Ranked average precision=Confidence weighted score, the official metric for TREC2002
• Upper-bound=confidence weighted score given perfect confidence estimation
21
AQUAINTRecent Progress Recent Progress
• In the last six months, we have:
– Retrained our name tagger (IdentiFinderTM) for roughly 29 question types
– Distributed the re-trained English version of IdentiFinder to other sites
– Participated in the Question Answering track of TREC 2002
– Participated in a pilot evaluation of automatically answering definitional/biographical questions
– Developed a demonstration of our question answering system AQUA against streaming news