a semantic approach for question classification using wordnet and wikipedia

19
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and Technology 1 A semantic approach for question classification using WordNet and Wikipedia Presenter : Cheng-Hui Chen Authors : Santosh Kumar Ray, Shailendra Singh, B.P. Joshi PRL, 2010

Upload: adonia

Post on 16-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

A semantic approach for question classification using WordNet and Wikipedia. Presenter : Cheng- Hui Chen Authors : Santosh Kumar Ray, Shailendra Singh, B.P. Joshi PRL, 2010. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

1

A semantic approach for question classification using WordNet and Wikipedia

Presenter : Cheng-Hui Chen Authors : Santosh Kumar Ray, Shailendra Singh, B.P. Joshi

PRL, 2010

Page 2: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

2

Outlines Motivation Objectives Methodology Experiments Conclusions Comments

Page 3: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation· Question classification module of a Question

Answering System plays a very important role.· Web pages retrieved by these search engines

do not provide precise information and may contain irrelevant information in even top ranked results.

· Moldovan et al. (2003) showed that 36.4% of the errors were generated due to incorrect question classification.

3

Page 4: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objectives· Proposed a question classification method that

exploits the powerful semantic features of the WordNet and the vast knowledge repository of the Wikipedia to describe informative terms explicitly.

· Provide answers of the user queries in succinct form.

4

Page 5: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology· Question classification algorithm to classify

questions using WordNet and Wikipedia.· Detail

─ Question database collection─ Identification of question patterns─ Question classification algorithm

5

Page 6: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Question database collection· The question database consists of 5500

training and 500 test questions collected from english questions published by USC.

· All questions of the dataset have been manually labeled by Li and Roth according to the coarse and fine grained categories

6

Coarse class Fine classes

ABBREVIATION ENTITY abbreviation, expression abbreviated animal, body, color, creative, currency, diseases and medical, event, food,instrument, lang, letter, other, plant,product,religion, sport, substance, symbol, technique, term, vehicle, word

DESCRIPTION HUMAN LOCATION NUMERIC

definition, description, manner, reason group, ind, title, description city, country, mountain, other, state code, count, date, distance, money, order, other, period, percent, speed, temp, size, weight

Page 7: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Identification of question patternsQuestion type Characteristic ExampleFunctional Word Question

(1)All Non-Wh questions (except how)(2)Start with Non-significant verb phrases.

I don't know the man.

When Questions (1)start with ‘‘When” keyword and related to the year or day with month.(2)The general pattern is “When (do|does|did|AUX) NP VP X”.

When did you write that book?

Where Questions (1)start with ‘‘Where” keyword and are related to the location.

Where is my dog?

Which Questions (1)The general pattern is ‘‘Which NP X”?

Which company manufactures video-game hardware?

Who/Whose/Whom Questions

(1) These questions generally ask about an individual or an organization.

Who is Mary?

7

Page 8: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Identification of question patterns

8

Question type characteristic Example

Why Questions (1)These questions ask for certain reasons or explanations.

Why do heavier objects travel downhill faster?

How Question (1)The general pattern is ‘‘How [do|does|did|AUX] NP VP X?” (1.1)Answer type is description of some process(2)How[big|fast|long|many|much|far|awayerthigh|…] X?” (2.1) pattern returns some number as answer.

(1)How do you know?(2)How long are you living in?

What Questions It can ask for virtually anything What is your name?

Page 9: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Question classification algorithm· If any of the question patterns matches with

the given question, its entity type will be determined using algorithm QC (question classification).

9

Where is my dog?

Location label

I don't know the man.

Delete do and return the man

Page 10: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Question classification algorithm· Takes a string as an input and calls the

Procedure online for determination of expected entity type.

10

The man

Human, Vehicle

Page 11: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Question classification algorithm· Input and uses online resources, Wikipedia and

WordNet, to determine the type of expected entity.─ It was observed that a typical article in Wikipedia

starts like‘‘. . .X is a Y, Z, . . .” Y, Z etc. are synonyms, hypernyms, hyponyms or some

semantically related term to X and these are considered to be possible entity types.

If a sentence written in Wikipedia is ‘‘X is Y, Z, . . .”, the procedure online takes Y, Z, . . . as possible entity type of X.

11

Vehicle , Human, Location (TE1)

Human, Indiviadual , Vehicle (TE2)

Human, Vehicle (C)

Page 12: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.An application of question classification: answer validation· The question‘‘In what year did Arundhati Roy

receive a Booker Prize?”1. Similarity computation

─ Similarity score The Question contains five tokens ‘‘a Number”,‘‘Arundhati Roy”,

‘‘received”, ‘‘Booker”, ‘‘Prize”.· If a candidate answer sentence when parsed contains two tokens out

of these five tokens, it has similarity score of 0.4.· The expanded query ‘‘ In what year did (‘‘Arundhati Roy” or

Arundhati) (Receive OR Get) Booker (Prize OR Award)?”.· The passage retrieval phase return top 10 answer sentences. Five

answer sentences out of these 10 answer sentences got required similarity score.

12

Page 13: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.An application of question classification : answer validation2. Entity type

The question classification module computes ‘‘date” as expected entity type for this question.

It considering date to be a number (optionally with month name or word ‘‘year”), four candidate answer sentences containing some number were sent to the next stage for further processing.

13

Page 14: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.An application of question classification : answer validation3. World Wide Web validation

Four candidates passed the first two tests. Three contained‘‘1997” as answer in them and the fourth returned ‘‘£ 20,000”. Only the first answer (1997) was validated by topmost documents returned by Google.

Hence, the three candidate answer sentences containing this answer were validated as correct answers.

14

Page 15: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments (QC algorithm)

15

Page 16: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments (Answer validation) · Sourse

─ TREC (Text REtrieval Conference)─ WorldBook (The World Book)─ Worldfactbook (CIA the world Factbook)─ Other standard resources.

16

Page 17: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments (Answer validation)

17

Page 18: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

18

Conclusions· Question classification algorithm with high

accuracy.· The proposed method seems to be promising for

question classification in the field of open-domain question answering.

· The proposed method combines the World Wide Web with Natural Language Processing (NLP) techniques.

Page 19: A semantic approach for question classification using  WordNet  and Wikipedia

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

19

Comments· Advantages

─ The distinctive points of the algorithm are lying in its dynamic and extendible properties.

─ Proposed method promising for question classification.· Shortages

─ It is having few limitations· Applications

─ Information retrieval