1 clef 2012, rome qa4mre, question answering for machine reading evaluation anselmo peñas (uned,...

1

CLEF 2012, Rome

QA4MRE, Question Answering for Machine Reading Evaluation

Anselmo Peñas (UNED, Spain)Eduard Hovy (USC-ISI, USA)Pamela Forner (CELCT, Italy)Álvaro Rodrigo (UNED, Spain)Richard Sutcliffe (U. Limerick, Ireland)Roser Morante (U. Antwerp, Belgium)Walter Daelemans (U. Antwerp, Belgium)Caroline Sporleder (U. Saarland, Germany)Corina Forascu (UAIC, Romania)Yassine Benajiba (Philips, USA)Petya Osenova (Bulgarian Academy of Sciences)

2

Question Answering Track at CLEF

2003

2004

2005

2006

2007

2008

2009

2010 2011 2012

QA Task

s

Multiple Language QA Main Task ResPubliQA QA4MRE

Temporal restrictions and lists

Answer Validation Exercise (AVE)

GikiCLEF

Negation and Modality

Real Time

QA over Speech Transcriptions

(QAST)

Biomedical

WiQA

WSD QA

3

Portrayal

Along the years, we learnt that the architecture is one of the main limitations for improving QA technology

So we bet on a reformulation:

Question

Answer

Questionanalysis

PassageRetrieval

AnswerExtraction

AnswerRanking

1.00.8 0.8 0.64x x =

4

Hypothesis generation + validation

Question

Searching space of

candidate answers

Hypothesis generation

functions+

Answer validation functions

Answer

5

We focus on validation …

Is the candidate answer correct?

QA4MRE setting:

Multiple Choice Reading Comprehension Tests

Measure progress in two reading abilities• Answer questions about a single text• Capture knowledge from text

collections

6

… and knowledge

Why capture knowledge from text collections?

We need knowledge to understand language The ability of making inferences about

texts is correlated to the amount of knowledge considered

Texts always omit information we need to recover• To build the complete story behind the

document• And be sure about the answer

Text as source of knowledge

Text Collection (background collection) Set of documents that contextualize the one

under reading (20,000-100,000 docs.)• We can imagine this done on the fly by the

machine• Retrieval

Big and diverse enough to acquire knowledge

Define a scalable strategy: topic by topic Reference collection per topic

8

Background Collections

They must serve to acquire General facts (with categorization and relevant

relations) Abstractions (such as

This is sensitive to occurrence in texts Thus, also to the way we create the collection

Key: Retrieve all relevant documents and only them Classical IR Interdependence with topic definition

• The topic is defined by the set of queries that produce the collection

9

Example: Biomedical

Alzheimer’s Disease Literature CorpusSearch PubMed about Alzheimer

Query: (((((("Alzheimer Disease"[Mesh] OR "Alzheimer's disease antigen"[Supplementary Concept] OR "APP protein, human"[Supplementary Concept] OR "PSEN2 protein, human"[Supplementary Concept] OR "PSEN1 protein, human"[Supplementary Concept]) OR "Amyloid beta-Peptides"[Mesh]) OR "donepezil"[Supplementary Concept]) OR ("gamma-secretase activating protein, human"[Supplementary Concept] OR "gamma-secretase activating protein, mouse"[Supplementary Concept])) OR "amyloid beta-protein (1-42)"[Supplementary Concept]) OR "Presenilins"[Mesh]) OR "Neurofibrillary Tangles"[Mesh] OR "Alzheimer's disease"[All Fields] OR "Alzheimer's Disease"[All Fields] OR "Alzheimer s disease"[All Fields] OR "Alzheimers disease"[All Fields] OR "Alzheimer's dementia"[All Fields] OR "Alzheimer dementia"[All Fields] OR "Alzheimer-type dementia"[All Fields] NOT "non-Alzheimer"[All Fields] NOT ("non-AD"[All Fields] AND "dementia"[All Fields]) AND (hasabstract[text] AND English[lang])

66,222 abstracts

10

Questions (Main Task)

Distribution of question types27 PURPOSE30 METHOD36 CAUSAL36 FACTOID31 WHICH-IS-TRUE

Distribution of answer types75 REQUIRE NO EXTRA KNOWLEDGE46 REQUIRE BACKGROUND KNOWLEDGE21 REQUIRE INFERENCE20 REQUIRE GATHERING INFORMATION FROM DIFFERENT SENTENCES

11

Questions (Biomedical Task)

Question types1. Experimental

evidence/qualifier2. Protein-protein interaction3. Gene synonymy relation4. Organism source relation5. Regulatory relation6. Increase (higher expression)7. Decrease (reduction)8. Inhibition

Answer types

Simple: The answer is found almost verbatim in the paper

Medium: The answer is rephrased

Complex: Require combining pieces of evidence and inference

They involve a predefined set of entity types

Main Task

16 test documents, 160 questions, 800 candidate answers

4 Topics1. AIDS2. Music and Society3. Climate Change4. Alzheimer (divulgative sources: blogs, web, news, …)

4 Reading tests per topicDocument + 10 questions5 choices per question

6 LanguagesEnglish, German, Spanish, Italian, Romanian, Arabic

new

new

Biomedical Task

Same setting Scientific language Focus on one disease: Alzheimer

Alzheimer's Disease Literature Corpus (ADLC)

66,222 abstracts from PubMed 9,500 full articles Most of them processed:

• Dependency parser GDep (Sagae and Tsujii 2007)

• UMLS-based NE tagger (CLiPS)• ABNER NE tagger (Settles 2005)

Task on Modality and Negation

Given an event in the text decide whether it is

1. Asserted (NONE: no negation and no speculation)

2. Negated (NEG: negation and no speculation)3. Speculated but negated (NEGMOD)4. Speculated and not negated (MOD)Is the event present as

certain?Yes No

Did it happen?

Is it negated?

Yes NoYes No

NEGMOD

MODNONE NEG

15

Participation

2011 20120

20406080

100

ParticipantsRuns

TaskRegistere

dgroups

Participant groups Submitted Runs

Main 25 11 43

Biomedical 23 7 43

Modality and Negation 3 3 6

Total 51 21 92

~100% increase

16

Evaluation and results

QA perspective evaluationc@1 over all questions (random 0.2)

Reading perspective evaluationAggregating results test by test (pass if

c@1 > 0.5)

Best systems Main

Best systems Biomedical

0.65 0.55

0.40 0.47

Best systems Main

Best systems Biomedical

Tests passed: 12 / 16

Tests passed: 3 / 4

Tests passed: 6 /16

17

More details during the workshop

Monday 17th Sep.17:00 - 18:00 Poster Session

Tuesday 18th Sep.10:40 – 12:40 Invited Talk + Overviews14:10 – 16:10 Reports from participants

(Main + Bio)16:40 – 17:15 Reports from participants

(Mod&Neg)17:15 – 18:10 Breakout session

Thanks!

1 clef 2012, rome qa4mre, question answering for machine reading evaluation anselmo peñas (uned,...

Documents