© johan bos april 2008 question answering (qa) lecture 1 what is qa? query log analysis challenges...
TRANSCRIPT
© J
oh
an B
os
Ap
ril 2
008
Question Answering (QA)
Lecture 1• What is QA?• Query Log Analysis• Challenges in QA• History of QA• System Architecture• Methods• System Evaluation• State-of-the-art
Lecture 2• Question Analysis• Background Knowledge• Answer Typing
Lecture 3• Query Generation• Document Analysis• Semantic Indexing• Answer Extraction• Selection and Ranking
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Pronto architecture
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Lecture 3
© J
oh
an B
os
Ap
ril 2
008
Question Answering
Lecture 3Query Generation
• Document Analysis
• Semantic Indexing
• Answer Extraction
• Selection and Ranking
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
© J
oh
an B
os
Ap
ril 2
008
Query Generation
• Once we analysed the question, we need to retrieve appropriate documents
• Most QA systems use an off-the-shelf information retrieval system for this task
• Examples:– Lemur– Lucene– Indri (used by Pronto)
• The input of the IR system is a query;the output is a ranked set of documents
© J
oh
an B
os
Ap
ril 2
008
Queries
• Query generation depends on the way documents are indexed
• Based on– Semantic analysis of the question– Expected answer type– Background knowledge
• Computing a good query is hard – we don’t want too little documents, and we don’t want too many!
© J
oh
an B
os
Ap
ril 2
008
Generating Query Terms
• Example 1:– Question: Who discovered prions?
– Text A: Dr. Stanley Prusiner received the Nobel prize for the discovery of prions.
– Text B: Prions are a kind of proteins that…
• Query terms?
© J
oh
an B
os
Ap
ril 2
008
Generating Query Terms
• Example 2:– Question: When did Franz Kafka die?
– Text A: Kafka died in 1924.– Text B: Dr. Franz died in 1971.
• Query terms?
© J
oh
an B
os
Ap
ril 2
008
Generating Query Terms
• Example 3:– Question: How did actor James Dean die?
– Text:
James Dean was killed in a car accident.
• Query terms?
© J
oh
an B
os
Ap
ril 2
008
Useful query terms
• Ranked on importance:– Named entities– Dates or time expressions– Expressions in quotes– Nouns– Verbs
• Queries can be expanded using the created local knowledge base
© J
oh
an B
os
Ap
ril 2
008
Query expansion example
• Query: sacajawea Returns only five documents
• Use synonyms in query expansions
• New query: sacajawea OR sagajaweaReturns two hundred documents
TREC 44.6 (Sacajawea)
How much is the Sacajawea coin worth?
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
© J
oh
an B
os
Ap
ril 2
008
Question Answering
Lecture 3• Query GenerationDocument Analysis
• Semantic Indexing
• Answer Extraction
• Selection and Ranking
© J
oh
an B
os
Ap
ril 2
008
Document Analysis – Why?
• The aim of QA is to output answers, not documents
• We need document analysis to– Find the correct type of answer in the
documents– Calculate the probability that an answer
is correct
• Semantic analysis is important to get valid answers
© J
oh
an B
os
Ap
ril 2
008
Document Analysis – When?
• After retrieval– token or word based index– keyword queries– low precision
• Before retrieval– semantic indexing– concept queries– high precision– More NLP required
© J
oh
an B
os
Ap
ril 2
008
Document Analysis – How?
• Ideally use the same NLP tools as for question analysis– This will make the semantic matching of
Question and Answer easier– Not always possible: wide coverage tools
are usally good at analysing text, but not at analysing questions
– Questions are often not part of large annotated corpora, on which NLP tools are trained
© J
oh
an B
os
Ap
ril 2
008
Documents vs Passages
• Split documents into smaller passages– This will make the semantic matching
faster and more accurate– In Pronto the passage size is two
sentences, implemented by a sliding window
• Too small passages risk losing important contextual information– Pronouns and referring expressions
© J
oh
an B
os
Ap
ril 2
008
Document Analysis
• Tokenisation
• Part of speech tagging
• Lemmatisation
• Syntactic analysis (Parsing)
• Semantic analysis (Boxing)• Named entity recognition• Anaphora resolution
© J
oh
an B
os
Ap
ril 2
008
Why semantics is important
• Example:– Question: When did Franz Kafka die? – Text A:
The mother of Franz Kafka died in 1918.
© J
oh
an B
os
Ap
ril 2
008
Why semantics is important
• Example:– Question: When did Franz Kafka die? – Text A:
The mother of Franz Kafka died in 1918.– Text B:
Kafka lived in Austria. He died in 1924.
© J
oh
an B
os
Ap
ril 2
008
Why semantics is important
• Example:– Question: When did Franz Kafka die? – Text A:
The mother of Franz Kafka died in 1918.– Text B:
Kafka lived in Austria. He died in 1924.– Text C:
Both Kafka and Lenin died in 1924.
© J
oh
an B
os
Ap
ril 2
008
Why semantics is important
• Example:– Question: When did Franz Kafka die? – Text A:
The mother of Franz Kafka died in 1918.– Text B:
Kafka lived in Austria. He died in 1924.– Text C:
Both Kafka and Lenin died in 1924.– Text D:
Max Brod, who knew Kafka, died in 1930.
© J
oh
an B
os
Ap
ril 2
008
Why semantics is important
• Example:– Question: When did Franz Kafka die? – Text A:
The mother of Franz Kafka died in 1918.– Text B:
Kafka lived in Austria. He died in 1924.– Text C:
Both Kafka and Lenin died in 1924.– Text D:
Max Brod, who knew Kafka, died in 1930.– Text E:
Someone who knew Kafka died in 1930.
© J
oh
an B
os
Ap
ril 2
008
DRS for “The mother of Franz Kafka died in 1918.”
_____________________ | x3 x4 x2 x1 | |---------------------| | mother(x3) | | named(x4,kafka,per) | | named(x4,franz,per) | | die(x2) | | thing(x1) | | event(x2) | | of(x3,x4) | | agent(x2,x3) | | in(x2,x1) | | timex(x1)=+1918XXXX | |_____________________|
© J
oh
an B
os
Ap
ril 2
008
DRS for:“Kafka lived in Austria. He died in 1924.”
_______________________ _____________________ | x3 x1 x2 | | x5 x4 | |-----------------------| |---------------------|(| male(x3) |+| die(x5) |) | named(x3,kafka,per) | | thing(x4) | | live(x1) | | event(x5) | | agent(x1,x3) | | agent(x5,x3) | | named(x2,austria,loc) | | in(x5,x4) | | event(x1) | | timex(x4)=+1924XXXX | | in(x1,x2) | |_____________________| |_______________________|
© J
oh
an B
os
Ap
ril 2
008
DRS for: “Both Kafka and Lenin died in 1924.”
_____________________| x6 x5 x4 x3 x2 x1 ||---------------------|| named(x6,kafka,per) || die(x5) || event(x5) || agent(x5,x6) || in(x5,x4) || timex(x4)=+1924XXXX || named(x3,lenin,per) || die(x2) || event(x2) || agent(x2,x3) || in(x2,x1) || timex(x1)=+1924XXXX ||_____________________|
© J
oh
an B
os
Ap
ril 2
008
DRS for:“Max Brod, who knew Kafka, died in 1930.”
_____________________| x3 x5 x4 x2 x1 ||---------------------|| named(x3,brod,per) || named(x3,max,per) || named(x5,kafka,per) || know(x4) || event(x4) || agent(x4,x3) || patient(x4,x5) || die(x2) || event(x2) || agent(x2,x3) || in(x2,x1) || timex(x1)=+1930XXXX ||_____________________|
© J
oh
an B
os
Ap
ril 2
008
DRS for:“Someone who knew Kafka died in 1930.”
_____________________| x3 x5 x4 x2 x1 ||---------------------|| person(x3) || named(x5,kafka,per) || know(x4) || event(x4) || agent(x4,x3) || patient(x4,x5) || die(x2) || event(x2) || agent(x2,x3) || in(x2,x1) || timex(x1)=+1930XXXX ||_____________________|
© J
oh
an B
os
Ap
ril 2
008
Document Analysis
• Tokenisation
• Part of speech tagging
• Lemmatisation
• Syntactic analysis (Parsing)
• Semantic analysis (Boxing)Named entity recognition• Anaphora resolution
© J
oh
an B
os
Ap
ril 2
008
Recall the Answer-Type Taxonomy
• We divided questions according to their expected answer type
• Simple Answer-Type Taxonomy
PERSONNUMERALDATEMEASURELOCATIONORGANISATIONENTITY
© J
oh
an B
os
Ap
ril 2
008
Named Entity Recognition
• In order to make use of the answer types, we need to be able to recognise named entities of the same types in the documents
PERSONNUMERALDATEMEASURELOCATIONORGANISATIONENTITY
© J
oh
an B
os
Ap
ril 2
008
Example Text
Italy’s business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of Arthur Andersen.
© J
oh
an B
os
Ap
ril 2
008
Named entities
Italy’s business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of Arthur Andersen.
© J
oh
an B
os
Ap
ril 2
008
Named Entity Recognition
<ENAMEX TYPE=„LOCATION“>Italy</ENAME>‘s business world was rocked by the announcement <TIMEX TYPE=„DATE“>last Thursday</TIMEX> that Mr. <ENAMEX TYPE=„PERSON“>Verdi</ENAMEX> would leave his job as vice-president of <ENAMEX TYPE=„ORGANIZATION“>Music Masters of Milan, Inc</ENAMEX> to become operations director of <ENAMEX TYPE=„ORGANIZATION“>Arthur Andersen</ENAMEX>.
© J
oh
an B
os
Ap
ril 2
008
NER difficulties
• Several types of entities are too numerous to include in dictionaries
• New names turn up every day
• Ambiguities – Paris, Lazio
• Different forms of same entities in same text– Brian Jones … Mr. Jones
• Capitalisation
© J
oh
an B
os
Ap
ril 2
008
NER approaches
• Rule-based approaches– Hand-crafted rules– Help from databases of known
named entities [e.g. locations]
• Statistical approaches– Features – Machine learning
© J
oh
an B
os
Ap
ril 2
008
Document Analysis
• Tokenisation
• Part of speech tagging
• Lemmatisation
• Syntactic analysis (Parsing)
• Semantic analysis (Boxing)• Named entity recognitionAnaphora resolution
© J
oh
an B
os
Ap
ril 2
008
What is anaphora?
• Relation between a pronoun and another element in the same or earlier sentence
• Anaphoric pronouns: – he, him, she, her, it, they, them
• Anaphoric noun phrases:– the country, – these documents, – his hat, her dress
© J
oh
an B
os
Ap
ril 2
008
Anaphora (pronouns)
• Question:What is the biggest sector in Andorra’s economy?
• Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of its tiny, well-to-do economy, accounts for roughly 80% of the GDP.
• Answer: ?
© J
oh
an B
os
Ap
ril 2
008
Anaphora (definite descriptions)
• Question:What is the biggest sector in Andorra’s economy?
• Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of the country’s tiny, well-to-do economy, accounts for roughly 80% of the GDP.
• Answer: ?
© J
oh
an B
os
Ap
ril 2
008
Anaphora Resolution
• Anaphora Resolution is the task of finding the antecedents of anaphoric expressions
• Example system:– Mitkov, Evans & Orasan (2002)– http://clg.wlv.ac.uk/MARS/
© J
oh
an B
os
Ap
ril 2
008
“Kafka lived in Austria. He died in 1924.”
_______________________ _____________________ | x3 x1 x2 | | x5 x4 | |-----------------------| |---------------------|(| male(x3) |+| die(x5) |) | named(x3,kafka,per) | | thing(x4) | | live(x1) | | event(x5) | | agent(x1,x3) | | agent(x5,x3) | | named(x2,austria,loc) | | in(x5,x4) | | event(x1) | | timex(x4)=+1924XXXX | | in(x1,x2) | |_____________________| |_______________________|
© J
oh
an B
os
Ap
ril 2
008
“Kafka lived in Austria. He died in 1924.”
_______________________ _____________________ | x3 x1 x2 | | x5 x4 | |-----------------------| |---------------------|(| male(x3) |+| die(x5) |) | named(x3,kafka,per) | | thing(x4) | | live(x1) | | event(x5) | | agent(x1,x3) | | agent(x5,x3) | | named(x2,austria,loc) | | in(x5,x4) | | event(x1) | | timex(x4)=+1924XXXX | | in(x1,x2) | |_____________________| |_______________________|
© J
oh
an B
os
Ap
ril 2
008
Co-reference resolution
• Question:What is the biggest sector in Andorra’s economy?
• Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of Andorra’s tiny, well-to-do economy, accounts for roughly 80% of the GDP.
• Answer: Tourism
© J
oh
an B
os
Ap
ril 2
008
Question Answering
Lecture 3• Query Generation
• Document AnalysisSemantic Indexing
• Answer Extraction
• Selection and Ranking
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
© J
oh
an B
os
Ap
ril 2
008
Semantic indexing
• If we index documents on the token level, we cannot search for specific semantic concepts
• If we index documents on semantic concepts, we can formulate more specific queries
• Semantic indexing requires a complete preprocessing of the entire document collection [can be costly]
© J
oh
an B
os
Ap
ril 2
008
Semantic indexing example
• Example NL question:
When did Franz Kafka die?
• Term-based – query: kafka– Returns all passages containing the term “kafka"
© J
oh
an B
os
Ap
ril 2
008
Semantic indexing example
• Example NL question: When did Franz Kafka die?
• Term-based – query: kafka– Returns all passages containing the term “kafka"
• Concept-based – query: DATE & kafka – Returns all passages containing the term "kafka"
and a date expression
© J
oh
an B
os
Ap
ril 2
008
Question Answering
Lecture 3• Query Generation
• Document Analysis
• Semantic IndexingAnswer Extraction
• Selection and Ranking
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
© J
oh
an B
os
Ap
ril 2
008
Answer extraction
• Passage retrieval gives us a set of ranked documents
• Match answer with question– DRS for question– DRS for each possible document– Score for amount of overlap
• Deep inference or shallow matching
• Use knowledge
© J
oh
an B
os
Ap
ril 2
008
Answer extraction: matching
• Given a question and an expression with a potential answer, calculate a matching score S = match(Q,A) that indicates how well Q matches A
• Example– Q: When was Franz Kafka born?
– A1: Franz Kafka died in 1924.
– A2: Kafka was born in 1883.
© J
oh
an B
os
Ap
ril 2
008
Using logical inference
• Recall that Boxer produces first order representations [DRSs]
• In theory we could use a theorem prover to check whether a retrieved passage entails or is inconsistent with a question
• In practice this is too costly, given the high number of possible answer + question pairs that need to be considered
• Also: theorem provers are precise – they don’t give us information if they almost find a proof, although this would be useful for QA
© J
oh
an B
os
Ap
ril 2
008
Semantic matching
• Matching is an efficient approximation to the inference task
• Consider flat semantic representation of the passage and the question
• Matching gives a score of the amount of overlap between the semantic content of the question and a potential answer
© J
oh
an B
os
Ap
ril 2
008
Matching Example
• Question: When was Franz Kafka born?
• Passage 1:Franz Kafka died in 1924.
• Passage 2:Kafka was born in 1883.
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [1]
answer(X)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,X)
franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Q: A1:
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [1]
answer(X)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,X)
franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Q: A1:
X=x2
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [1]
answer(x2)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,x2)
franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Q: A1:
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [1]
answer(x2)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,x2)
franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Q: A1:
Y=x1
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [1]
answer(x2)
franz(x1)
kafka(x1)
born(E)
patient(E,x1)
temp(E,x2)
franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Q: A1:
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [1]
answer(x2)
franz(x1)
kafka(x1)
born(E)
patient(E,x1)
temp(E,x2)
franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Q: A1:
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [1]
answer(x2)
franz(x1)
kafka(x1)
born(E)
patient(E,x1)
temp(E,x2)
Match score = 3/6 = 0.50
Q: A1: franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(X)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,X)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(X)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,X)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
X=x2
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(x2)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,x2)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(x2)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,x2)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
Y=x1
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(x2)
franz(x1)
kafka(x1)
born(E)
patient(E,x1)
temp(E,x2)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(x2)
franz(x1)
kafka(x1)
born(E)
patient(E,x1)
temp(E,x2)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
E=x3
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(x2)
franz(x1)
kafka(x1)
born(x3)
patient(x3,x1)
temp(x3,x2)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(x2)
franz(x1)
kafka(x1)
born(x3)
patient(x3,x1)
temp(x3,x2)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(x2)
franz(x1)
kafka(x1)
born(x3)
patient(x3,x1)
temp(x3,x2)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
Match score = 4/6 = 0.67
© J
oh
an B
os
Ap
ril 2
008
Matching Example
• Question: When was Franz Kafka born?
• Passage 1:Franz Kafka died in 1924.
• Passage 2:Kafka was born in 1883.
Match score = 0.67
Match score = 0.50
© J
oh
an B
os
Ap
ril 2
008
Matching Techniques
• Weighted matching– Higher weight for named entities– Estimate weights using machine learning
• Incorporate background knowledge– WordNet [hyponyms]– NomLex– Paraphrases:
BORN(E) & IN(E,Y) & DATE(Y) TEMP(E,Y)
© J
oh
an B
os
Ap
ril 2
008
Question Answering
Lecture 3• Query Generation
• Document Analysis
• Semantic Indexing
• Answer ExtractionSelection and Ranking
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
© J
oh
an B
os
Ap
ril 2
008
Answer selection
• Rank answer– Group duplicates– Syntactically or semantically equivalent– Sort on frequency
• How specific should an answer be?– Semantic relations between answers– Hyponyms, synonyms– Answer modelling
[PhD thesis Dalmas 2007]
• Answer cardinality
© J
oh
an B
os
Ap
ril 2
008
Answer selection example 1
• Where did Franz Kafka die?– In his bed– In a sanatorium– In Kierling– Near Vienna– In Austria
– In Berlin– In Germany
© J
oh
an B
os
Ap
ril 2
008
Answer selection example 2
• Where is 3M based?– In Maplewood– In Maplewood, Minn.– In Minnesota– In the U.S.– In Maplewood, Minn., USA
– In San Francisco– In the Netherlands
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
© J
oh
an B
os
Ap
ril 2
008
Reranking
• Most QA systems first produce a list of possible answers…
• This is usually followed by a process called reranking
• Reranking promotes correct answers to a higher rank
© J
oh
an B
os
Ap
ril 2
008
Factors in reranking
• Matching score– The better the match with the question, the
more likely the answers
• Frequency– If the same answer occurs many times,
it is likely to be correct
© J
oh
an B
os
Ap
ril 2
008
Answer Validation
• Answer Validation– check whether an answer is likely to be
correct using an expensive method
• Tie breaking– Deciding between two answers with similar
probability
• Methods:– Inference check– Sanity checking– Googling
© J
oh
an B
os
Ap
ril 2
008
Inference check
• Use first-order logic [FOL] to check whether a potential answer entails the question
• This can be done with the use of a theorem prover– Translate Q into FOL– Translate A into FOL– Translate background knowledge into FOL – If ((BKfol & Afol) Qfol) is a theorem,
we have a likely answer
© J
oh
an B
os
Ap
ril 2
008
Sanity Checking
Answer should be informative, that is, not part of the question
Q: Who is Tom Cruise married to?
A: Tom Cruise
Q: Where was Florence Nightingale born?
A: Florence
© J
oh
an B
os
Ap
ril 2
008
Googling
• Given a ranked list of answers, some of these might not make sense at all
• Promote answers that make sense
• How?
• Use even a larger corpus!– “Sloppy” approach– “Strict” approach
© J
oh
an B
os
Ap
ril 2
008
The World Wide Web
© J
oh
an B
os
Ap
ril 2
008
Answer validation (sloppy)
• Given a question Q and a set of answers A1…An
• For each i, generate query Q Ai
• Count the number of hits for each i
• Choose Ai with most number of hits
• Use existing search engines– Google, AltaVista– Magnini et al. 2002 (CCP)
© J
oh
an B
os
Ap
ril 2
008
Corrected Conditional Probability
• Treat Q and A as a bag of words– Q = content words question– A = answer
hits(A NEAR Q)
• CCP(Qsp,Asp) = ------------------------------ hits(A) x hits(Q)
• Accept answers above a certain CCP threshold
© J
oh
an B
os
Ap
ril 2
008
Answer validation (strict)
• Given a question Q and a set of answers A1…An
• Create a declarative sentence with the focus of the question replaced by Ai
• Use the strict search option in Google– High precision– Low recall
• Any terms of the target not in the sentence as added to the query
© J
oh
an B
os
Ap
ril 2
008
Example
• TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?
• Top-5 Answers: 1) Britain
* 2) Okemah, Okla.3) Newport
* 4) Oklahoma5) New York
© J
oh
an B
os
Ap
ril 2
008
Example: generate queries
• TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?
• Generated queries: 1) “Guthrie was born in Britain”
2) “Guthrie was born in Okemah, Okla.”3) “Guthrie was born in Newport”4) “Guthrie was born in Oklahoma”5) “Guthrie was born in New York”
© J
oh
an B
os
Ap
ril 2
008
Example: add target words
• TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?
• Generated queries: 1) “Guthrie was born in Britain” Woody
2) “Guthrie was born in Okemah, Okla.” Woody3) “Guthrie was born in Newport” Woody4) “Guthrie was born in Oklahoma” Woody5) “Guthrie was born in New York” Woody
© J
oh
an B
os
Ap
ril 2
008
Example: morphological variants
TREC 99.3
Target: Woody Guthrie.
Question: Where was Guthrie born?
Generated queries:“Guthrie is OR was OR are OR were born in Britain” Woody
“Guthrie is OR was OR are OR were born in Okemah, Okla.” Woody
“Guthrie is OR was OR are OR were born in Newport” Woody
“Guthrie is OR was OR are OR were born in Oklahoma” Woody
“Guthrie is OR was OR are OR were born in New York” Woody
© J
oh
an B
os
Ap
ril 2
008
Example: google hits
TREC 99.3
Target: Woody Guthrie.
Question: Where was Guthrie born?
Generated queries:“Guthrie is OR was OR are OR were born in Britain” Woody 0
“Guthrie is OR was OR are OR were born in Okemah, Okla.” Woody 10
“Guthrie is OR was OR are OR were born in Newport” Woody 0
“Guthrie is OR was OR are OR were born in Oklahoma” Woody 42
“Guthrie is OR was OR are OR were born in New York” Woody 2
© J
oh
an B
os
Ap
ril 2
008
Example: reranked answers
TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?
Original answers 1) Britain
* 2) Okemah, Okla.3) Newport
* 4) Oklahoma5) New York
Reranked answers * 4) Oklahoma
* 2) Okemah, Okla.5) New York 1) Britain3) Newport
© J
oh
an B
os
Ap
ril 2
008
Question Answering (QA)
Lecture 1• What is QA?• Query Log Analysis• Challenges in QA• History of QA• System Architecture• Methods• System Evaluation• State-of-the-art
Lecture 2• Question Analysis• Background Knowledge• Answer Typing
Lecture 3• Query Generation• Document Analysis• Semantic Indexing• Answer Extraction• Selection and Ranking
© J
oh
an B
os
Ap
ril 2
008
Where to go from here
• Producing answers in real-time
• Improve accuracy
• Answer explanation
• User modelling
• Speech interfaces
• Dialogue (interactive QA)
• Multi-lingual QA
• Non sequential architectures