javelin project briefing aquaint program 1 aquaint workshop, october 2005 javelin project briefing...
TRANSCRIPT
1AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
JAVELIN Project Briefing
Eric Nyberg, Teruko Mitamura, Jamie Callan, Robert Frederking, Jaime Carbonell,
Matthew Bilotti, Jeongwoo Ko, Frank Lin, Lucian Lita, Vasco Pedro, Andrew Schlaikjer,
Hideki Shima, Luo Si, David Svoboda
Language Technologies InstituteCarnegie Mellon University
2AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Status Update
• Project Start: September 30, 2004 (now in Month 13)
• Last Six Months:– Initial CLQA system evaluated in NTCIR
(English-Japanese, English-Chinese)
– Multilingual Distributed IR evaluated in CLEF competition
– Initial Phase II English system in TREC relationship track
3AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Multilingual QA
4AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Javelin Multilingual QA
• End-to-end systems for English to Chinese and English to Japanese
• Participated in NTCIR-5 CLQA-1 (E-C, E-J) evaluation
– http://www.slt.atr.jp/CLQA/
• NTCIR5 workshop will be held in Tokyo, Japan, December 6-9, 2005
5AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
NTCIR CLQA1 task overview
• EC, CC, CE, EJ, JE subtasks– Answer named entities (e.g. person name, organization
name, location, artifact, date, money, time, etc.)– We were the only team that participated in both EC and
EJ subtasks• Question/answer data set
– EC: 200 for training and formal run– EJ: 300 for training and 200 for formal run
• Corpus– EC: United Daily News 2000-2001 (466,564 articles)– EJ: Yomiuri Newspaper 2000-2001 (658,719 articles)
6AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
CLQA1 Evaluation Criteria
• Only the top answer candidate is judged, along with its supporting document
• Correct answers that were not properly supported by the returned document were judged to be unsupported
• Answer is incorrect even if a substring of the answer is correct
• Issue: we found that the gold-standard document set (supported) is not complete
7AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
MLQA Architecture
QA RS IX AG
KeywordTranslator Chinese
Corpus
EM
JapaneseCorpus
ChineseIX
JapaneseIX
EnglishCorpus
EnglishIX
Original Module/Resources
ML Module/Resources
8AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
QA RS IX AG
KeywordTranslator Chinese
Corpus
EM
JapaneseCorpus
ChineseIX
JapaneseIX
EnglishCorpus
EnglishIX
How much did the Japan
Bank for International Cooperation decide to loan to the Taiwan High-Speed
Corporation?
How much did the Japan Bank for International
Cooperation decide to loan to the Taiwan High-Speed
Corporation?
9AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
QA RS IX AG
KeywordTranslator Chinese
Corpus
EM
JapaneseCorpus
ChineseIX
JapaneseIX
EnglishCorpus
EnglishIX
Answer Type = MONEYKeyword = ___________________
Answer Type = MONEYKeyword = ___________________Bank for International Cooperation
Taiwan High-Speed Corporationloan
10AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
QA RS IX AG
KeywordTranslator Chinese
Corpus
EM
JapaneseCorpus
ChineseIX
JapaneseIX
EnglishCorpus
EnglishIXAnswer Type = MONEY
Keyword = _____________Answer Type = MONEY
Keyword = _____________
Answer Type = MONEYKeyword = ___________________
Answer Type = MONEYKeyword = ___________________Bank for International Cooperation
Taiwan High-Speed Corporationloan
11AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
QA RS IX AG
KeywordTranslator Chinese
Corpus
EM
JapaneseCorpus
ChineseIX
JapaneseIX
EnglishCorpus
EnglishIX
DocID = JY-20010705J1TYMCC1300010, Confidence = 44.01
DocID = JY-20011116J1TYMCB1300010, Confidence = 42.95::
DocID = JY-20010705J1TYMCC1300010, Confidence = 44.01
DocID = JY-20011116J1TYMCB1300010, Confidence = 42.95::
12AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
QA RS IX AG
KeywordTranslator Chinese
Corpus
EM
JapaneseCorpus
ChineseIX
JapaneseIX
EnglishCorpus
EnglishIX
Answer Candidate = Confidence = 0.0718Passage =
13AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
QA RS IX AG
KeywordTranslator Chinese
Corpus
EM
JapaneseCorpus
ChineseIX
JapaneseIX
EnglishCorpus
EnglishIX
Cluster and Re-rank answer candidates.
Cluster and Re-rank answer candidates.
14AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
QA RS IX AG
KeywordTranslator Chinese
Corpus
EM
JapaneseCorpus
ChineseIX
JapaneseIX
EnglishCorpus
EnglishIX
Answer =
15AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Formal Run Result
No. ofParticipants
Number ofSubmissions MAX MIN MEDIAN AVE
EC 4 8 25(33) 6(8) 14.5(19) 15.63(19.75)
EJ 4 11 25(31) 0(0) 17(18) 12.73(14.61)
Only the top answer candidate is judged
Measured by number of correct answers
(unsupported answers in brackets)
16AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
With Partial Gold Standard Input
a Average precision of answer-type detectionbAverage precision of keyword translation over 200 formal run questionscAverage precision of document retrieval. Counted if correct document was ranked between 1st–15th
dAverage precision of answer extraction. Counted if correct answer was ranked between 1st–100th
eThe MRR measure of IX performance, calculated by averaging the sum of the reciprocal of each answer’s rankfOverall accuracy of the systemgAccuracy including unsupported answers
17AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
• Question Analyzer, Retrieval Strategist have relatively high accuracy
18AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
• QA, RS have relatively high accuracy• Translation accuracy affects overall accuracy
greatly– Accuracy in RS increased by 26.5% in EC and 22.5% in EJ.– If unsupported answers are considered, there is a 10.5%
improvement in accuracy for EC and 2.5% for EJ.– We found correct documents that are not in the gold-
standard set.
plus 22.5%plus 22.5%
plus 26.5%plus 26.5%
19AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
• QA, RS have relatively high accuracy• Translation accuracy affects overall accuracy greatly• There is room for improvement in IX
– Raise accuracy and reduce noise
Average precision of answer extraction is calculated by counting correct answers ranked between 1st–100th
Average precision of answer extraction is calculated by counting correct answers ranked between 1st–100th
The MRR measure of IX performance is calculated by
averaging the sum of the reciprocal of each answer’s rank
The MRR measure of IX performance is calculated by
averaging the sum of the reciprocal of each answer’s rank
20AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
• Translation accuracy affects overall accuracy greatly• QA, RS have relatively high accuracy• There is room for improvement in IX
– Raise accuracy and reduce noise
• Validation function in AG is crucial– Filter out noise in IX output– Boost up rank of correct answer
Only the topmost answer candidate is judged at the
end, big accuracy drop
Only the topmost answer candidate is judged at the
end, big accuracy drop
21AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Next Steps for Multilingual QA
• Improve translation of keywords from E-C and E-J(e.g. named entity translation)
• Improve extraction using syntactic and semantic information in Chinese and Japanese (e.g. use of Cabocha)
• Improve Validation function in AG
• Upcoming Evaluation(s):– NTCIR CLQA-2, if available in 2006– AQUAINT E-C definition question pilot when training/test
data is available
• Integrate with Distributed IR (next slides)
22AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Current Multilingual QA Systems
English
Japanese
Chinese
English QA
ChineseCLQA
JapaneseCLQA
English Questions
Answers in Japanese
Answers in Chinese
3 separate systems, no distributed IR
23AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Future Vision
English QA
ChineseCLQA
JapaneseCLQA
English Questions
Answers in Japanese
Answers in ChineseDistributed
IR
single, integrated system with distributed IR
Chinese
Japanese
English
24AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Multilingual Distributed Information Retrieval
25AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
What Is Distributed IR?
• A method of searching across multiple full-text search engines– “federated search”, “the hidden Web”
• It is important when relevant information is scattered across many search engines– Within an organization– On the Web– Which ones have the information you need?
26AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Many Search EnginesDon’t Speak English
27AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Multilingual Distributed IR:Recent Progress
ResearchExtend monolingual algorithms to
multilingual environments• Multilingual query-based sampling
– Monolingual corpora
• Multilingual result-merging– Given retrieval results in N languages,
produce a single multilingual ranked list
28AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Multilingual Distributed IR:Recent Progress
Evaluation• CLEF Multi-8 Ad-hoc Retrieval task
– English (2), Spanish (1), French (2), Italian (2), Swedish (1), German (2), Finnish (1), Dutch (2)
• Why CLEF?– More languages than NTCIR
• More languages is harder
– CLEF is focusing on result-merging this year• Models uncooperative environments, where we have
no control over individual search engines
29AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
CLEF 2005:Two Cross-Lingual Retrieval Tasks • Usual Ad-hoc Cross-lingual Retrieval
– Cooperative search engines, under our control– English queries, documents in 8 languages, 8 search engines
• Multilingual Results Merging Task– Uncooperative search engines, nothing under our control
• We get only ranked lists of documents from each engine– We treat the task as a multilingual federated search problem
• Documents in language l are stored in search engine s• Minimize cost of downloading, indexing and translating
documents
30AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
CLEF 2005:Usual Ad hoc Cross-lingual Retrieval
For each query:1. Four distinct retrieval methods r
– Translate English queries into target language• With & without pseudo relevance feedback
– Translate all documents into English• With/without pseudo relevance feedback
– Lemur search engine2. Combine all results from method r into a
multilingual result3. Combine results from all methods into a final resultUse training data to maximize combination accuracy
31AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
CLEF 2005: Cross-lingual Results Merging Task
For each query:1. Download a few top-ranked documents from each source2. Create “comparable scores” for each downloaded document
by combining results of four methods (previous slide)3. For each downloaded document we have
<source search engine score, comparable score>4. Train language-specific, query-specific logistic models to
transform any source-specific score to a comparable score5. Estimate comparable scores for all ranked documents from
each source6. Merge documents by their comparable scores
32AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Multilingual Distributed IR:CLEF Results
Mean Average Precision (MAP) across 40 queries
Our Best Run
Other Best Run
Median Run
Ad hoc Cross-lingual Retrieval
0.449 0.333 0.261
Result Merging 0.419 0.329 0.298
33AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
CandidatePredicates
RetrievalStrategist
OntologyAnnotationsDatabase
Text Annotator
Identifi
nder
ASSER
T
MX
Term
inato
r
Corpus
QuestionKey
PredicatesQuestionAnalyzer
AnswerPassages
AnswerGenerator
InformationExtractor
RankedPredicate List
SemanticIndex
off-lineindexing
Extending JAVELIN with Domain Semantics
34AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
John S. gave Mary an orchid for her birthday.
NE tagger
Entity tagger
Semantic Parser
Ref Resolver
Basic Tokens
Verb expansion
Unified Terms
Predicate Structure Formation
All tags are stand-off annotations stored in a relational data model
35AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Retrieval on Predicate-Argument Structure
InputQuestion
OutputAnswers
QuestionAnalysis
DocumentRetrieval
Post-Processing
AnswerExtraction
“Who did Smith meet?"
36AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Retrieval on Predicate-Argument Structure
Predicate-Argument Template
ARG0 ARG1meet
?x
InputQuestion
OutputAnswers
QuestionAnalysis
DocumentRetrieval
Post-Processing
AnswerExtraction
“Who did Smith meet?"
Smith
37AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
IRWhat the IR engine sees:
ARG0 ARG1meet
?x
InputQuestion
OutputAnswers
QuestionAnalysis
DocumentRetrieval
Post-Processing
AnswerExtraction
“Who did Smith meet?"
Smith
“Frank met Alice.Smith dislikes Bob."
“Smith met Jones.”
Some Retrieved Documents:
Retrieval on Predicate-Argument Structure
38AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
RDBMS
InputQuestion
OutputAnswers
QuestionAnalysis
DocumentRetrieval
Post-Processing
AnswerExtraction
“Who did Smith meet?"
“Frank met Alice.John dislikes Bob." “Smith met Jones.”X
Matching Predicate Instance
ARG0 ARG1meet
JonesSmith
“Jones”
ARG0 ARG1meet
AliceFrank
ARG0 ARG1dislikes
BobJohn
Retrieval on Predicate-Argument Structure
39AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Preliminary Results:TREC 2005 Relationship QA Track
• Partial system:– Semantic indexing not fully integrated– Question analysis module incomplete
• Our Goal: measure ability to retrieve relevant nuggets
• Submitted a second run with manual predicate bracketing (questions)
• Results (in MRR of relevant nuggets):– Run 1: 0.1356– Run 2: 0.5303
40AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Example: Question AnalysisThe analyst is interested in Iraqi oil smuggling. Specifically, is Iraq smuggling oil to other countries, and if so, which countries? In addition,who is behind the Iraqi oil smuggling?
interested
Iraqi oilsmuggling
Theanalyst
ARG0 ARG1
smuggling
oil
Iraq
ARG0
ARG1
othercountries
ARG2
smuggling
oil
Iraq
ARG0
ARG1
whichcountries
ARG2
is behind
the Iraqi oilsmuggling
Who
ARG0 ARG1
41AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Example: ResultsThe analyst is interested in Iraqi oil smuggling. Specifically, is Iraq smuggling oil to other countries, and if so, which countries? In addition,who is behind the Iraqi oil smuggling?
1. “The amount of oil smuggled out of Iraq has doubled since August last year, when oil prices began to increase,” Gradeck said in a telephone interview Wednesday from Bahrain.
2. U.S.: Russian Tanker Had Iraqi Oil By ROBERT BURNS, AP Military Writer WASHINGTON (AP) – Tests of oil samples taken from a Russian tanker suspected of violating the U.N. embargo on Iraq show that it was loaded with petroleum products derived from both Iranian and Iraqi crude, two senior defense officials said.
5. With no American or allied effort to impede the traffic, between 50,000 and 60,000 barrels of Iraqi oil and fuel products a day are now being smuggled along the Turkish route, Clinton administration officials estimate.
(7 of 15 nuggets judged relevant)
42AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Next Steps• Better Question Analysis
– Retrain ASSERT-style annotator –or-incorporate rule-based NLP from HALO (KANTOO)
• Semantic Indexing and Retrieval– Moving to Indri allows exact representation of our
predicate structure in the index and in queries
• Ranking retrieved predicate instances– Aggregating information across documents
• Extracting answers from predicate-argument structure
43AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Key Predicates using Event Semantics from a Domain Ontology
possess is a precondition of operate, export, …possess is a postcondition of assemble, buy, …
More useful passages are matched
44AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Improved Results
assembleoperateinstalldevelopexportimportmanufacture
45AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Indexing of Predicate Structures Implemented Using Indri (October ’05)
(web demo available)
#combine[predicate]( buy.target #any:gpe.arg0 weapon.arg1 )
46AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Some Recent PapersE. Nyberg, R. Frederking, T. Mitamura, M. Bilotti, K. Hannan, L. Hiyakumoto, J. Ko, F.
Lin, L. Lita, V. Pedro, and A. Schlaikjer , "JAVELIN I and II Systems at TREC 2005", notebook paper submitted to TREC 2005.
"CMU JAVELIN System for NTCIR5 CLQA1", F. Lin, H. Shima, M. Wang, T. Mitamura, to appear in Proceedings of the 5th NTCIR Workshop.
L. Si and J. Callan, "Modeling search engine effectiveness for federated search.", Proceedings of the Twenty Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil.
L. Si and J. Callan, "CLEF 2005: Multilingual retrieval by combining multiple multilingual ranked lists.", Sixth Workshop of the Cross-Language Evaluation Forum, CLEF 2005, Vienna, Austria.
E. Nyberg, T. Mitamura, R. Frederking, V. Pedro, M. Bilotti, A. Schlaikjer and K. Hannan (2005). "Extending the JAVELIN QA System with Domain Semantics", to appear in Proceedings of AAAI 2005 (Workshop on Question Answering in Restricted Domains)
L. Hiyakumoto, L.V. Lita, E. Nyberg (2005). “Multi-Strategy Information Extraction for Question Answering”, FLAIRS 2005, to appear.
http://www.cs.cmu.edu/~ehn/JAVELIN
47AQUAINT Workshop, October 2005
JAVELIN Project Briefing
AQUAINTProgram
Questions ?