1 clef 2012, rome qa4mre, question answering for machine reading evaluation anselmo peñas (uned,...
TRANSCRIPT
![Page 1: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/1.jpg)
1
CLEF 2012, Rome
QA4MRE, Question Answering for Machine Reading Evaluation
Anselmo Peñas (UNED, Spain)Eduard Hovy (USC-ISI, USA)Pamela Forner (CELCT, Italy)Álvaro Rodrigo (UNED, Spain)Richard Sutcliffe (U. Limerick, Ireland)Roser Morante (U. Antwerp, Belgium)Walter Daelemans (U. Antwerp, Belgium)Caroline Sporleder (U. Saarland, Germany)Corina Forascu (UAIC, Romania)Yassine Benajiba (Philips, USA)Petya Osenova (Bulgarian Academy of Sciences)
![Page 2: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/2.jpg)
2
Question Answering Track at CLEF
2003
2004
2005
2006
2007
2008
2009
2010 2011 2012
QA Task
s
Multiple Language QA Main Task ResPubliQA QA4MRE
Temporal restrictions and lists
Answer Validation Exercise (AVE)
GikiCLEF
Negation and Modality
Real Time
QA over Speech Transcriptions
(QAST)
Biomedical
WiQA
WSD QA
![Page 3: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/3.jpg)
3
Portrayal
Along the years, we learnt that the architecture is one of the main limitations for improving QA technology
So we bet on a reformulation:
Question
Answer
Questionanalysis
PassageRetrieval
AnswerExtraction
AnswerRanking
1.00.8 0.8 0.64x x =
![Page 4: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/4.jpg)
4
Hypothesis generation + validation
Question
Searching space of
candidate answers
Hypothesis generation
functions+
Answer validation functions
Answer
![Page 5: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/5.jpg)
5
We focus on validation …
Is the candidate answer correct?
QA4MRE setting:
Multiple Choice Reading Comprehension Tests
Measure progress in two reading abilities• Answer questions about a single text• Capture knowledge from text
collections
![Page 6: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/6.jpg)
6
… and knowledge
Why capture knowledge from text collections?
We need knowledge to understand language The ability of making inferences about
texts is correlated to the amount of knowledge considered
Texts always omit information we need to recover• To build the complete story behind the
document• And be sure about the answer
![Page 7: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/7.jpg)
Text as source of knowledge
Text Collection (background collection) Set of documents that contextualize the one
under reading (20,000-100,000 docs.)• We can imagine this done on the fly by the
machine• Retrieval
Big and diverse enough to acquire knowledge
Define a scalable strategy: topic by topic Reference collection per topic
![Page 8: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/8.jpg)
8
Background Collections
They must serve to acquire General facts (with categorization and relevant
relations) Abstractions (such as
This is sensitive to occurrence in texts Thus, also to the way we create the collection
Key: Retrieve all relevant documents and only them Classical IR Interdependence with topic definition
• The topic is defined by the set of queries that produce the collection
![Page 9: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/9.jpg)
9
Example: Biomedical
Alzheimer’s Disease Literature CorpusSearch PubMed about Alzheimer
Query: (((((("Alzheimer Disease"[Mesh] OR "Alzheimer's disease antigen"[Supplementary Concept] OR "APP protein, human"[Supplementary Concept] OR "PSEN2 protein, human"[Supplementary Concept] OR "PSEN1 protein, human"[Supplementary Concept]) OR "Amyloid beta-Peptides"[Mesh]) OR "donepezil"[Supplementary Concept]) OR ("gamma-secretase activating protein, human"[Supplementary Concept] OR "gamma-secretase activating protein, mouse"[Supplementary Concept])) OR "amyloid beta-protein (1-42)"[Supplementary Concept]) OR "Presenilins"[Mesh]) OR "Neurofibrillary Tangles"[Mesh] OR "Alzheimer's disease"[All Fields] OR "Alzheimer's Disease"[All Fields] OR "Alzheimer s disease"[All Fields] OR "Alzheimers disease"[All Fields] OR "Alzheimer's dementia"[All Fields] OR "Alzheimer dementia"[All Fields] OR "Alzheimer-type dementia"[All Fields] NOT "non-Alzheimer"[All Fields] NOT ("non-AD"[All Fields] AND "dementia"[All Fields]) AND (hasabstract[text] AND English[lang])
66,222 abstracts
![Page 10: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/10.jpg)
10
Questions (Main Task)
Distribution of question types27 PURPOSE30 METHOD36 CAUSAL36 FACTOID31 WHICH-IS-TRUE
Distribution of answer types75 REQUIRE NO EXTRA KNOWLEDGE46 REQUIRE BACKGROUND KNOWLEDGE21 REQUIRE INFERENCE20 REQUIRE GATHERING INFORMATION FROM DIFFERENT SENTENCES
![Page 11: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/11.jpg)
11
Questions (Biomedical Task)
Question types1. Experimental
evidence/qualifier2. Protein-protein interaction3. Gene synonymy relation4. Organism source relation5. Regulatory relation6. Increase (higher expression)7. Decrease (reduction)8. Inhibition
Answer types
Simple: The answer is found almost verbatim in the paper
Medium: The answer is rephrased
Complex: Require combining pieces of evidence and inference
They involve a predefined set of entity types
![Page 12: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/12.jpg)
Main Task
16 test documents, 160 questions, 800 candidate answers
4 Topics1. AIDS2. Music and Society3. Climate Change4. Alzheimer (divulgative sources: blogs, web, news, …)
4 Reading tests per topicDocument + 10 questions5 choices per question
6 LanguagesEnglish, German, Spanish, Italian, Romanian, Arabic
new
new
![Page 13: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/13.jpg)
Biomedical Task
Same setting Scientific language Focus on one disease: Alzheimer
Alzheimer's Disease Literature Corpus (ADLC)
66,222 abstracts from PubMed 9,500 full articles Most of them processed:
• Dependency parser GDep (Sagae and Tsujii 2007)
• UMLS-based NE tagger (CLiPS)• ABNER NE tagger (Settles 2005)
![Page 14: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/14.jpg)
Task on Modality and Negation
Given an event in the text decide whether it is
1. Asserted (NONE: no negation and no speculation)
2. Negated (NEG: negation and no speculation)3. Speculated but negated (NEGMOD)4. Speculated and not negated (MOD)Is the event present as
certain?Yes No
Did it happen?
Is it negated?
Yes NoYes No
NEGMOD
MODNONE NEG
![Page 15: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/15.jpg)
15
Participation
2011 20120
20406080
100
ParticipantsRuns
TaskRegistere
dgroups
Participant groups Submitted Runs
Main 25 11 43
Biomedical 23 7 43
Modality and Negation 3 3 6
Total 51 21 92
~100% increase
![Page 16: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/16.jpg)
16
Evaluation and results
QA perspective evaluationc@1 over all questions (random 0.2)
Reading perspective evaluationAggregating results test by test (pass if
c@1 > 0.5)
Best systems Main
Best systems Biomedical
0.65 0.55
0.40 0.47
Best systems Main
Best systems Biomedical
Tests passed: 12 / 16
Tests passed: 3 / 4
Tests passed: 6 /16
![Page 17: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,](https://reader035.vdocuments.site/reader035/viewer/2022072117/56649cc45503460f9498da7b/html5/thumbnails/17.jpg)
17
More details during the workshop
Monday 17th Sep.17:00 - 18:00 Poster Session
Tuesday 18th Sep.10:40 – 12:40 Invited Talk + Overviews14:10 – 16:10 Reports from participants
(Main + Bio)16:40 – 17:15 Reports from participants
(Mod&Neg)17:15 – 18:10 Breakout session
Thanks!