1 clef 2012, rome qa4mre, question answering for machine reading evaluation anselmo peñas (uned,...

17
1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT, Italy) Álvaro Rodrigo (UNED, Spain) Richard Sutcliffe (U. Limerick, Ireland) Roser Morante (U. Antwerp, Belgium) Walter Daelemans (U. Antwerp, Belgium) Caroline Sporleder (U. Saarland, Germany) Corina Forascu (UAIC, Romania) Yassine Benajiba (Philips, USA) Petya Osenova (Bulgarian Academy of

Upload: ashlynn-butler

Post on 16-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

1

CLEF 2012, Rome

QA4MRE, Question Answering for Machine Reading Evaluation

Anselmo Peñas (UNED, Spain)Eduard Hovy (USC-ISI, USA)Pamela Forner (CELCT, Italy)Álvaro Rodrigo (UNED, Spain)Richard Sutcliffe (U. Limerick, Ireland)Roser Morante (U. Antwerp, Belgium)Walter Daelemans (U. Antwerp, Belgium)Caroline Sporleder (U. Saarland, Germany)Corina Forascu (UAIC, Romania)Yassine Benajiba (Philips, USA)Petya Osenova (Bulgarian Academy of Sciences)

Page 2: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

2

Question Answering Track at CLEF

2003

2004

2005

2006

2007

2008

2009

2010 2011 2012

QA Task

s

Multiple Language QA Main Task ResPubliQA QA4MRE

Temporal restrictions and lists

Answer Validation Exercise (AVE)

GikiCLEF

Negation and Modality

Real Time

QA over Speech Transcriptions

(QAST)

Biomedical

WiQA

WSD QA

Page 3: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

3

Portrayal

Along the years, we learnt that the architecture is one of the main limitations for improving QA technology

So we bet on a reformulation:

Question

Answer

Questionanalysis

PassageRetrieval

AnswerExtraction

AnswerRanking

1.00.8 0.8 0.64x x =

Page 4: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

4

Hypothesis generation + validation

Question

Searching space of

candidate answers

Hypothesis generation

functions+

Answer validation functions

Answer

Page 5: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

5

We focus on validation …

Is the candidate answer correct?

QA4MRE setting:

Multiple Choice Reading Comprehension Tests

Measure progress in two reading abilities• Answer questions about a single text• Capture knowledge from text

collections

Page 6: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

6

… and knowledge

Why capture knowledge from text collections?

We need knowledge to understand language The ability of making inferences about

texts is correlated to the amount of knowledge considered

Texts always omit information we need to recover• To build the complete story behind the

document• And be sure about the answer

Page 7: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

Text as source of knowledge

Text Collection (background collection) Set of documents that contextualize the one

under reading (20,000-100,000 docs.)• We can imagine this done on the fly by the

machine• Retrieval

Big and diverse enough to acquire knowledge

Define a scalable strategy: topic by topic Reference collection per topic

Page 8: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

8

Background Collections

They must serve to acquire General facts (with categorization and relevant

relations) Abstractions (such as

This is sensitive to occurrence in texts Thus, also to the way we create the collection

Key: Retrieve all relevant documents and only them Classical IR Interdependence with topic definition

• The topic is defined by the set of queries that produce the collection

Page 9: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

9

Example: Biomedical

Alzheimer’s Disease Literature CorpusSearch PubMed about Alzheimer

Query: (((((("Alzheimer Disease"[Mesh] OR "Alzheimer's disease antigen"[Supplementary Concept] OR "APP protein, human"[Supplementary Concept] OR "PSEN2 protein, human"[Supplementary Concept] OR "PSEN1 protein, human"[Supplementary Concept]) OR "Amyloid beta-Peptides"[Mesh]) OR "donepezil"[Supplementary Concept]) OR ("gamma-secretase activating protein, human"[Supplementary Concept] OR "gamma-secretase activating protein, mouse"[Supplementary Concept])) OR "amyloid beta-protein (1-42)"[Supplementary Concept]) OR "Presenilins"[Mesh]) OR "Neurofibrillary Tangles"[Mesh] OR "Alzheimer's disease"[All Fields] OR "Alzheimer's Disease"[All Fields] OR "Alzheimer s disease"[All Fields] OR "Alzheimers disease"[All Fields] OR "Alzheimer's dementia"[All Fields] OR "Alzheimer dementia"[All Fields] OR "Alzheimer-type dementia"[All Fields] NOT "non-Alzheimer"[All Fields] NOT ("non-AD"[All Fields] AND "dementia"[All Fields]) AND (hasabstract[text] AND English[lang])

66,222 abstracts

Page 10: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

10

Questions (Main Task)

Distribution of question types27 PURPOSE30 METHOD36 CAUSAL36 FACTOID31 WHICH-IS-TRUE

Distribution of answer types75 REQUIRE NO EXTRA KNOWLEDGE46 REQUIRE BACKGROUND KNOWLEDGE21 REQUIRE INFERENCE20 REQUIRE GATHERING INFORMATION FROM DIFFERENT SENTENCES

Page 11: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

11

Questions (Biomedical Task)

Question types1. Experimental

evidence/qualifier2. Protein-protein interaction3. Gene synonymy relation4. Organism source relation5. Regulatory relation6. Increase (higher expression)7. Decrease (reduction)8. Inhibition

Answer types

Simple: The answer is found almost verbatim in the paper

Medium: The answer is rephrased

Complex: Require combining pieces of evidence and inference

They involve a predefined set of entity types

Page 12: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

Main Task

16 test documents, 160 questions, 800 candidate answers

4 Topics1. AIDS2. Music and Society3. Climate Change4. Alzheimer (divulgative sources: blogs, web, news, …)

4 Reading tests per topicDocument + 10 questions5 choices per question

6 LanguagesEnglish, German, Spanish, Italian, Romanian, Arabic

new

new

Page 13: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

Biomedical Task

Same setting Scientific language Focus on one disease: Alzheimer

Alzheimer's Disease Literature Corpus (ADLC)

66,222 abstracts from PubMed 9,500 full articles Most of them processed:

• Dependency parser GDep (Sagae and Tsujii 2007)

• UMLS-based NE tagger (CLiPS)• ABNER NE tagger (Settles 2005)

Page 14: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

Task on Modality and Negation

Given an event in the text decide whether it is

1. Asserted (NONE: no negation and no speculation)

2. Negated (NEG: negation and no speculation)3. Speculated but negated (NEGMOD)4. Speculated and not negated (MOD)Is the event present as

certain?Yes No

Did it happen?

Is it negated?

Yes NoYes No

NEGMOD

MODNONE NEG

Page 15: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

15

Participation

2011 20120

20406080

100

ParticipantsRuns

TaskRegistere

dgroups

Participant groups Submitted Runs

Main 25 11 43

Biomedical 23 7 43

Modality and Negation 3 3 6

Total 51 21 92

~100% increase

Page 16: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

16

Evaluation and results

QA perspective evaluationc@1 over all questions (random 0.2)

Reading perspective evaluationAggregating results test by test (pass if

c@1 > 0.5)

Best systems Main

Best systems Biomedical

0.65 0.55

0.40 0.47

Best systems Main

Best systems Biomedical

Tests passed: 12 / 16

Tests passed: 3 / 4

Tests passed: 6 /16

Page 17: 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

17

More details during the workshop

Monday 17th Sep.17:00 - 18:00 Poster Session

Tuesday 18th Sep.10:40 – 12:40 Invited Talk + Overviews14:10 – 16:10 Reports from participants

(Main + Bio)16:40 – 17:15 Reports from participants

(Mod&Neg)17:15 – 18:10 Breakout session

Thanks!