alta workshop’04, macquarie university, sydney 8 december 2004 luiz augusto sangoi pizzato...

21
ALTA Workshop’04, Macquarie University, Sydney 8 December 2004 Luiz Augusto Sangoi Pizzato [email protected] http://www.ics.mq.edu.au/~pizzato Using a Trie-based Structure for Question Analysis

Upload: charity-johns

Post on 02-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

ALTA Workshop’04, Macquarie University, Sydney8 December 2004

Luiz Augusto Sangoi [email protected]

http://www.ics.mq.edu.au/~pizzato

Using a Trie-based Structure for Question Analysis

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (2/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

Question analysis Trie structure Question trie Building and retrieving using the trie Evaluation of the Technique Further work

“Using a Trie-based Structure for Question Analysis”

Outline

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (3/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

Our question analyser tries to answer two meta-questions:What is the kind of answer I have to

provide?• Define the expected answer type (EAT).

What is the subject of the question?• Define the question focus.

Question on question

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (4/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

EAT Handcrafted rules

• Normally by the use of RE WordNet top concepts (Moldovan et al., 2003)

• High quality results Support Vector Machines (SVM) (Zhang and Lee, 2003)

• Good results using a large training set

Focus Discard question’ stopwords.

Some approaches

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (5/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

Trie structure

)(,),(),()( 21 raSTaSTaSTST

rjaA1

a|b|c|d|e|f|...|z

a|b|c|d|e|f|...|z

a|b|c|d|...|r|...|z

car

a|b|c|d|e|f|...|z

a|b|c|d|e|f|...|z

a|b|c|d|...|r|...|z

a|b|c|d|e|f|...|zzebra

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (6/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

Patterns

Question Pattern EAT

Where is Chile? ^ Where is !LOC $ LOC

Who is the dean of ICS? ^ Who is the !POS of !ORG $ NAME

Who is J. Smith? ^ Who is !NAME $ DESC

Who is J. Smith of ICS? ^ Who is !NAME of !ORG $ DESC

How far is Athens? ^ How far is !LOC $ NO

How tall is Sting? ^ How tall is !NAME $ NO

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (7/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

1

2where

6who

18how

7is

13!NAME

9!POS

10of

11!ORG

12$ (eoq)

14$ (eoq)

8the

15of

16!ORG

17$ (eoq)

3is

4!LOC

5$ (eoq)

19far

20is 21!LOC

22$ (eoq)

23tall 24is

25!NAME

26$ (eoq)

^ (boq)

Question Trie

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (8/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

Nodes Information (EAT, Frequency)

1 (LOC,1),(NAME,1),(DESC,2),(NUMBER,2)

2-5 (LOC,1)

6-7 (NAME,1),(DESC,2)

8-12 (NAME,1)

13 (DESC,2)

14-17 (DESC,1)

18 (NUMBER,2)

19-26 (NUMBER,1)

1

2where

6who

18how

7is

13!NAME

9!POS

10of

11!ORG

12$ (eoq)

14$ (eoq)

8the

15of

16!ORG

17$ (eoq)

3is

4!LOC

5$ (eoq)

19far

20is 21!LOC

22$ (eoq)

23tall 24is

25!NAME

26$ (eoq)

^ (boq)

Question Trie

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (9/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

1 6who

7is

13!NAME

14$ (eoq)

15of

16!ORG

17$ (eoq)

^ (boq)

$^ who is John Smith of Macquarie University

? ?

$^ who is Madonna

?

Look-ahead process

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (10/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

JustAsk logs; 4.8% NL questions

• 60.732 of 1.275.116 were NL questions

• 47.844 unique NL questions

• 23% with some language problems:• Why this search not word?

• Unusual language:• Do u offer any scholarships 4 physiotherapy?

• Speculative questions:• Will I get a job in Australia after finishing my MBA?

MQ Questions

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (11/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

JustAsk questions were randomly selected and semi-automatically tagged according to a XML like structure

• <Q AT=‘DESC’>Who is <ENAMEX type=“NAME”>Luiz Pizzato</ENAMEX>?</Q>

Total number of questions: 1385• 233 – Who• 212 – What• 208 – Where• 203 – How• 529 – Other types:

• Am I, Are there, Can I, Do you, Is there, I want, I need, Which, Does, Tell me, Why, Have you, Could you, May I, Will I, Was I, Would you, Whom

Training Set

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (12/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

0

10

20

30

40

50

60

70

80

90

100

0 200 400 600 800 1000 1200 1400

Training set size

Pre

cis

ion

(%

)

EAT Average Trendline

Evaluation - EAT

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (13/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

0

10

20

30

40

50

60

70

80

90

100

0 200 400 600 800 1000 1200 1400

Training set size

Pe

rce

nta

ge

Recall Precision

Evaluation – Focus

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (14/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

Question Trie without Entities

1

2where

6who

19how

7is

13J.

9dean

10of

11ICS

12$ (eoq)

15$ (eoq)

8the

16of

17ICS

18$ (eoq)

3is

4Chile

5$ (eoq)

20far

21is 22Athens

23$ (eoq)

24tall 25is

26Sting

27$ (eoq)

^ (boq)

14Smith

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (15/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

0 50 100 150 200 250 300 350 400

Training set size

Pre

cis

ion

(%

)

EAT Average Trendline

Evaluation – TREC-2003

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (16/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

Comparison with SVM (Zhang and Lee, 2003)

60.00%

65.00%

70.00%

75.00%

80.00%

85.00%

90.00%

0 1000 2000 3000 4000 5000 6000

Size of the training set

Pre

cis

ion

(%

)

SVM - fine grained Trie - fine grained

SVM - coarse grained Trie - coarse grained

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (17/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

Concluding remarks

The developed technique offers reasonable results using no linguistic resources.

Future developments Define guidelines for the EAT markup and

review the markup of the MQ questions Adding POS and semantic information from

WordNet may replace entity markup

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (18/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

Combine lexical and POS informationWho is John Smith?

is VBZEAT freqNAME 1DESC 1

Who WPEAT freqNAME 1DESC 1

^ ^EAT freqNAME 1DESC 1

$ $EAT freqNAME 1

John NNPEAT freqNAME 1

Smith NNPEAT freqNAME 1

John Smith NNPEAT freqNAME 1

Further Work

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (19/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

Dell Zhang and Wee Sun Lee. 2003. Question classification using support vector machines. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR-03), pages 26–32. ACM Press.

Dan Moldovan, Marius Paşca, Sanda Harabagiu, and Mihai Surdeanu. 2003. Performance issues and error analysis in an open-domain question answering system. ACM Trans. Inf. Syst., 21(2):133–154.

References

Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (20/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.

Acknowledgments

My supervisorsDr. Diego Mollá-AliodDr. Rolf Schwitter Dr. Cecile Paris

ALTA Workshop’04, Macquarie University, Sydney8 December 2004

Luiz Augusto Sangoi [email protected]

http://www.ics.mq.edu.au/~pizzato

Using a Trie-based Structure for Question Analysis