aquaint kickoff meeting – december 2001 integrating robust semantics, event detection, information...

21
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering Vasileios Hatzivassiloglou, Kathleen R. McKeown Columbia University Dan Jurafsky, Wayne H. Ward, James H. Martin University of Colorado

Upload: arleen-melina-ward

Post on 25-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

AQUAINT Kickoff Meeting – December 2001

Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for

Multimedia Question Answering

Vasileios Hatzivassiloglou, Kathleen R. McKeown

Columbia University

Dan Jurafsky, Wayne H. Ward, James H. Martin

University of Colorado

AQUAINT Kickoff Meeting – December 2001

Current State of the Art

• TREC Q&A track

• Technology enabling focused retrieval

• Good coverage (up to 70%) of facts (who, when, where)

• Issues:– System must understand the question type– Answers extracted verbatim from the source– One source at a time

AQUAINT Kickoff Meeting – December 2001

Our focus – Question type

• Facts, but not absolute facts

• Rather, questions with answers that depend on– source– perspective– time

• When Mullah Omar was born vs. Who controls Jalalabad?

AQUAINT Kickoff Meeting – December 2001

Our focus – Multiple sources

• Integrate answers from multiple sources

• Use similarities across sources to locate core part of the answer

• Highlight important differences between sources

AQUAINT Kickoff Meeting – December 2001

Our focus – Answer form

• Answer contains– Core part where sources agree– Differences in perspective– Trends in time

• Text is not copied verbatim

• Text generation allows for concise combination of materials from multiple sources

AQUAINT Kickoff Meeting – December 2001

Our focus – Q&A Environment

• Spoken and written questions

• Specialized language model for accepting questions in realistic, noisy environments

• Context management system allows for– clarifications– follow-up questions

AQUAINT Kickoff Meeting – December 2001

Technology innovations

• Specialized speech recognition and dialog management

• Semantic parsing of questions and source text

• Event recognition

• Information fusion

AQUAINT Kickoff Meeting – December 2001

Architecture

AQUAINT Kickoff Meeting – December 2001

Semantic Parsing

• Use of semantic information important in many of the current best Q&A systems (Srihari et al 2000; Hovy et al 2001; Harabagiu et al 2001)

• Semantics often derived from lexical relationships and taxonomies, e.g., WordNet

• We will use instead an explicit representation of semantic roles, FrameNet

• Shallow, efficiently computable semantic representation

AQUAINT Kickoff Meeting – December 2001

FrameNet

• FrameNet provides database of words with associated semantic roles

AQUAINT Kickoff Meeting – December 2001

Filling in semantic roles

• Initial statistical syntactic parse identifies dependencies

• Statistical classifier assigns roles to phrases

• Issues:– Prominent role for named entities– Generalization across domains– FrameNet coverage and fallback techniques

AQUAINT Kickoff Meeting – December 2001

Event recognition

• Events vs. topics

• Events as a basis for segmenting documents and classifying document fragments as matching a question

• Event algebra will allow– grouping sub-events– linking related events– detecting updates

AQUAINT Kickoff Meeting – December 2001

Detecting an event

• Hypothesis: Events can be detected on– participants (named entities, semantic roles)– time– location– limited constraints on verbs

AQUAINT Kickoff Meeting – December 2001

Information fusion

• Combining answer fragments in a concise response

• Summarization for question answering– Clustering of related fragments at the phrase level

– Identification of syntactic dependencies and semantic roles

– Fusion of matching entities, including paraphrases

– Regeneration of the answer

AQUAINT Kickoff Meeting – December 2001

Technologies for information fusion and generation

• Learning paraphrases, syntactic and lexical (countless – lots of, repulsion – aversion)

• Detecting important differences

• Merging descriptions

• Learning and formulating content plans

AQUAINT Kickoff Meeting – December 2001

Speech and Dialog for Q&A

• Adapt speech recognition technology for– unknown words (using robust semantic classes)– named entities– noisy environment

• Dialog system for clarification and follow-up– use semantic annotation to generalize across

domains– maintain focus on important semantic elements

AQUAINT Kickoff Meeting – December 2001

Data

• Input will come from TREC and TDT-3– extended time span– multiple sources reporting on same event

• Annotation– TREC: focus on absolute facts– local: focus on questions with multiple answers– AQUAINT-level?

AQUAINT Kickoff Meeting – December 2001

Evaluation

• Component evaluation (several levels)– Identifying granularity of evaluation units– IR measures such as precision and recall– Human judgments for equivalence of output

• End-to-end evaluation– Participation in TREC– Extension of TREC methodology to questions

with multiple answers

AQUAINT Kickoff Meeting – December 2001

Tools we have (and can share)

• COMMUNICATOR speech recognition system (see our demo)

• FrameNet semantic resources and statistical semantic parser

• Document- and sentence-level clustering tools

• Summarization software (for the last two, see our NEWSBLASTER demo)

AQUAINT Kickoff Meeting – December 2001

Tools we are looking for

• Robust named entity recognizers

• Tools for classifying questions and identifying important elements in questions

• Tools for aiding human annotators

AQUAINT Kickoff Meeting – December 2001

Goals for the first six months

• Initial FrameNet parser (limited coverage)

• Identification of participants, time, location

• Identifying paraphrases from comparable news reports on the same event

• Adapting information fusion from summarization to question-answering

• Building prototype Q&A system