a pattern based approach to answering factoid, list and definition questions mark a. greenwood and...

A Pattern Based Approach to AnsweringA Pattern Based Approach to AnsweringFactoid, List and Definition QuestionsFactoid, List and Definition Questions

Mark A. Greenwood and Horacio Saggion

Natural Language Processing Group

Department of Computer Science

University of Sheffield, UK

April 27th 2004 RIAO 2004

Outline of TalkOutline of Talk• What is Question Answering?

Different Question Types

• System Description Factoid and List Questions

• System Architecture• Surface Matching Text Patterns• Fallback to Semantic Entities

Definition Questions• System Architecture• Knowledge Acquisition• Locating Possible Definitions

• Results and Evaluation Factoid and List Questions Definition Questions

• Conclusions and Future Work


What is Question Answering?What is Question Answering?

• The main aim of QA is to present the user with a short answer to a question rather than a list of possibly relevant documents.

• As it becomes more and more difficult to find answers on the WWW using standard search engines, question answering technology will become increasingly important.

• Answering questions using the web is already enough of a problem for it to appear in fiction (Marshall, 2002):

“I like the Internet. Really, I do. Any time I need a piece of shareware or I want to find out the weather in Bogotá… I’m the first guy to get the modem humming. But as a source of information, it sucks. You got a billion pieces of data, struggling to be heard and seen and downloaded, and anything I want to know seems to get trampled underfoot in the crowd.”


Different Question TypesDifferent Question Types

• Clearly there are many different types of questions which a user can ask. The system discussed in this presentation attempts to answer: Factoid Questions usually require a single fact as answer and include

questions such as “How high is Everest?” or “When was Mozart born?”. List Questions require multiple facts to be returned in answer to a

question. Examples are “Name 22 cities that have a subway system” or “Name companies which manufacture tractors”.

Definition Questions, such as “What is aspirin?”, which require answers covering essential (e.g. “aspirin is a drug”) as well as non-essential (e.g. “aspirin is a blood thinner”) descriptions of the definiendum (the term being defined).

• The system makes no attempt to answer other question types. For example speculative questions, such as “Is the airline industry in trouble?” are not handled.


System DescriptionSystem Description

• As the three types of questions of questions require different techniques to answer them the system consists of two sub-systems: Factoid: This system answers both the factoid and list questions. For

factoid questions the system returns the best answers and for list questions the system returns all the answers it found.

Definition: This system is only responsible for answering the definition questions.

• The rest of this section will provide an overview of both systems and how patterns are used to answer the differing question types.


Factoid System ArchitectureFactoid System Architecture

DocumentCollection

Top npassages

Questions

AnswerRanking

Answers

QuestionTyper

EntityFinder

Fallback to Semantic Entities

Pattern SetChooser

PatternSystem

Surface Matching Text Patterns

IR Engine


Surface Text PatternsSurface Text Patterns

• Learning patterns which can be used to find answers involves a two stage process: The first stage is to learn a set of patterns from a set of question-answer

pairs. The second stage involves assigning a precision to each pattern and

discarding those patterns which are tied to a specific question-answer pair.

• To explain the process we will use questions of the form “When was X born?”: As a concrete example we will use “When was Mozart born?”. For which the question-answer pair is:

• Mozart• 1756



• The first stage is to learn a set of patterns from the question-answer pairs for a specific question type: For each example the question and answer terms are submitted to

Google and the top ten documents are downloaded. Each document then has the question and answer terms replaced by

AnCHoR and AnSWeR respectively. Depending upon the question type other replacements are also made,

e.g. any dates may be replaced by a tag DatE. Those sentences which contain both AnCHoR and AnSWeR are

retained and joined together to create a single document. This generated document is then used to build a token-level suffix tree,

from which repeated strings containing both AnCHoR and AnSWeR and which do not span a sentence boundary are extracted as patterns.



• The result of the first stage is a set of patterns. For questions of the form “When was X born?” these may include:

AnCHor ( AnSWeR –

From AnCHoR ( AnSWeR – DatE )

AnCHor ( AnSWeR

• Unfortunately some of these patterns may be specific to the question used to generate them.

• So the second stage of the approach is concerned with filtering out these specific patterns to produce a set which can be used to answer unseen questions.


Surface Text PatternsSurface Text Patterns• The second stage of the approach requires a different set of

question-answer pairs to those used in the first stage: Within each of the top ten documents returned by Google, using only

the question term: the question term is replaced by AnCHoR and the answer (if it is present) with AnSWeR and any other replacements made in the first stage are also carried out.

Those sentences which contain AnCHoR are retained. All of the patterns from the first stage are converted to regular

expressions designed to capture the token which appears in place of AnSWeR.

Each regular expression is then matched against each sentence and along with each pattern two counts are maintained: Ca which is the total number of times this pattern has matched and Cc which counts the number of times AnSWeR was selected as the answer.

After a pattern has been matched against every sentence if Cc is less than 5 then it is discarded otherwise it’s precision is calculated as Cc/Ca and the pattern is retained only if the precision is greater than 0.1.


Surface Text PatternsSurface Text Patterns• The result of assigning precision to patterns in this way is a set

of precisions and regular expressions such as:0.967: AnCHoR \( ([^ ]+) - DatE0.566: AnCHoR \( ([^ ]+)0.263: AnCHoR ([^ ]+) –

• These patterns can then be used to answer unseen questions: The question term is submitted to Okapi and the top 20 returned

documents have the question term replaced with AnCHoR and any other replacments necessary are also made.

Those sentences which contain AnCHoR are extracted and combined to make a single document.

Each pattern is then applied to each sentence to extract possible answers.

All the answers found are sorted based firstly on the precision of the pattern which selected it and secondly on the number of times the same answer was found.


Fallback to Semantic EntitiesFallback to Semantic EntitiesQ: How high is Everest? D1: Everest’s 29,035 feet is 5.4 miles above sea level…

D2: At 29,035 feet the summit of Everest is the highest…

If Q contains ‘how’ and ‘high’ then thesemantic class, S, is measurement:distance

29,035 feetmeasurement:distance(‘5.4 miles’)1

measurement:distance(‘29,035 feet’)2

location(‘Everest’)2

Known Entities#

Okapi


Definition SystemDefinition System• Definition questions such as “What is Goth?” contain very little

information which can be used to retrieve relevant documents as they have almost nothing in common with potential answers: “a subculture that started as one component of the punk rock scene” “horror/mystery literature that is dark, eerie, and gloomy”

• Having extra knowledge about the definiendum is important: 217 sentences in AQUAINT contain the term “Goth”. If we know that “Goth” seems to be associated with “subculture” in

definition passages then we can narrow the search space. Only 6 sentences in AQUAINT contain the terms “Goth” & “subculture”.

• “the Goth subculture” • “gloomy subculture known as Goth”


Definition SystemDefinition System

• To extract extra information about the definiendum we use a set of linguistic patterns which we instantiate with the definiendum, for example: “X is a” “such as X” “X consists of”

• The patterns match many sentences some of which are definition bearing and some of which are not: “Goth is a subculture” “Becoming a Goth is a process that demands lots of effort”

• These patterns can be used to find terms which regularly appear along with the definiendum, outside of the target collection.


Definition System ArchitectureDefinition System Architecture

PatternGeneration

MiningOn-Line

Resources

Retrieval

DefiniendumExtraction

SecondaryTerm

Extraction

DefinitionExtraction

QuestionsPatterns

WordNet Web

InstantiatedPatterns

AQUAINT

Collection

RelevantPassages Secondary

Terms

Definiendum

Definiendum

Answers


Knowledge AcquisitionKnowledge Acquisition

• We parse the question in order to extract the definiendum

• We then use the linguistic patterns (“Goth is a”, “such as Goth”…) to find definition-bearing passages in: WordNet Britannica Web

• From these source we extract words (nouns, adjectives, verbs) and their frequencies from definition-bearing sentences.

• A sentence is definition bearing if: WordNet: the gloss of the definiendum and any associated hypernyms. Britannica: only if the sentence contains the definiendum. Web: only if sentence contains one of the linguistic patterns.


Knowledge AcquisitionKnowledge Acquisition• We retain all the words extracted from WordNet and all those words

which occurred more than once. The words are sorted based on their frequency of occurrence.

• A list of n secondary terms to be used for query expansion is formed: All terms found in WordNet, m A maximum of (n – m) / 2 terms from Britannica The list is expanded to size n with terms found on the web

Definiendum WordNet Britannica Web

aspirin analgesic;anti-inflammatory; antipyretic; drug; …

inhibit; prostaglandin; ketofren; synthesis; …

drug; drugs; blood; ibuprofen; medication; pain; …

Aum Shirikyo * NOTHING * * NOTHING * group; groups; cult; religious; japanese; …


Locating Possible DefinitionsLocating Possible Definitions

• An IR query consisting of all the words in the question as well as the acquired secondary terms is submitted to Okapi and the 20 most relevant passage are retrieved.

• Sentence which pass one of the following tests are then extracted as definition candidates: The sentence matches one of the linguistic patterns. The sentence contains the definiendum and at least 3 secondary terms

• To avoid the inclusion of unnecessary information we discard the sentence prefix which does not contain either the definiendum or any secondary terms.


Locating Possible DefinitionsLocating Possible Definitions

• Equivalent definitions are identified via the vector space model using the cosine similarity measure, and only one definition is retained.

• For example, the following two definitions are similar and only one would be retained by the system: “the Goth subculture” “gloomy subculture known as Goth”


Results and EvaluationResults and Evaluation

• The system was independently evaluated as part of the TREC 2003 question answering evaluation. This consisted of answer 413 factoid questions, 37 list questions and 50 definition questions.

• For further details on the evaluation metrics used by NIST see (Voorhees, 2003).


Results & Evaluation: FactoidResults & Evaluation: Factoid• Unfortunately only 12 of the 413 factoid questions were

suitable to be answered by the pattern sets. Even worse is the fact that none of the patterns were able to select any

answers, correct or otherwise.

• The fallback system correctly identified the answer type for 241 of the 413 questions 53 were given an incorrect type. 119 were outside the scope of the system.

• Okapi only located relevant documents for 131 of the questions the system could answer giving: a maximum attainable score of 0.317 (131/413) An official score of 0.138 (57/413) which contained 15 correct NIL

responses so… The system answered 42 questions giving a score of 0.102, 32% of the

maximum score.


Results & Evaluation: ListResults & Evaluation: List

• Similar problems occurred when the system was used to answer list questions. Over 37 questions only 20 distinct correct answers were returned Giving an official F-score of 0.029

• The ability of the system to locate a reasonable number of correct answers was offset as many answers were returned per question. There are seven known answers (in AQUAINT) to the question “What

countries have won the men’s World Cup for soccer?” This system returned 32 answers only two of which were correct This gives recall of 0.286 but precision of only 0.062


Results & Evaluation: DefinitionResults & Evaluation: Definition

• Definition systems are evaluated based on their ability to return information nuggets (snippets of text containing information that helps define the definiendum). Some of these nuggets are considered essential, i.e. a full definition must contain them.

• Our system produced answers for 28 of the 50 questions, 23 of which contained at least one essential nugget.

• The official score for the system was 0.236 placing the system 9th out of the 25 participants.

• The knowledge acquisition step provided relevant secondary terms for a number of questions. WordNet helped in 4 cases Britannica helped in 5 cases Web helped in 39 cases


ConclusionsConclusions

• When using patterns for answering factoid and list questions the surface text patterns should probably be acquired from a source with similar writing style to the collection from which answers will be drawn. Here we used the web to acquire the patterns and used them to find

answers in the AQUAINT collection which have differing writing styles.

• Using patterns to answering definition questions while more successful than the factoid system still has it’s problems: The filters used to determine if a passage is definition bearing is too

restrictive.

• Despite these failings the use of patterns for answering factoid, list and definition questions shows promise.


Future WorkFuture Work

• For the factoid and list QA system future work could include: acquiring a wider range of pattern sets to cover more question types; Using the full question not just the question term for passage retrieval;

• For the definition QA system future research could include: extracted secondary terms for definition questions could be ranked,

perhaps using IDF values, to help to eliminate inappropriate matches (aspirin is a great choice for active people).

a syntactic-based technique that prunes parse trees could be implemented to extract better definition strings

coreference information could be used in combination with the extraction patterns;

Any Questions?Any Questions?

Copies of these slides can be found at:

http://www.dcs.shef.ac.uk/~mark/phd/work/


BibliographyBibliography

Hamish Cunningham, Diana Maynard, Kalina Bontcheva and Valentin Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.

Mark A. Greenwood and Robert Gaizauskas. Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering. In Proceedings of the Workshop on Natural Language Processing for Question Answering (EACL03), pages 29–34, Budapest, Hungary, April 14, 2003.

Michael Marshall. The Straw Men. HarperCollins Publishers, 2002.

Ellen M. Voorhees. Overview of the TREC 2003 Question Answering Track. In Proceedings of the 12th Text REtrieval Conference, 2003.

a pattern based approach to answering factoid, list and definition questions mark a. greenwood and...

Documents

factoid questions

different types of questions

system answers

definition questions

subway system

example speculative

evaluation factoid

short answer