alta workshop’04, macquarie university, sydney 8 december 2004 luiz augusto sangoi pizzato...
TRANSCRIPT
ALTA Workshop’04, Macquarie University, Sydney8 December 2004
Luiz Augusto Sangoi [email protected]
http://www.ics.mq.edu.au/~pizzato
Using a Trie-based Structure for Question Analysis
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (2/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Question analysis Trie structure Question trie Building and retrieving using the trie Evaluation of the Technique Further work
“Using a Trie-based Structure for Question Analysis”
Outline
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (3/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Our question analyser tries to answer two meta-questions:What is the kind of answer I have to
provide?• Define the expected answer type (EAT).
What is the subject of the question?• Define the question focus.
Question on question
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (4/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
EAT Handcrafted rules
• Normally by the use of RE WordNet top concepts (Moldovan et al., 2003)
• High quality results Support Vector Machines (SVM) (Zhang and Lee, 2003)
• Good results using a large training set
Focus Discard question’ stopwords.
Some approaches
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (5/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Trie structure
)(,),(),()( 21 raSTaSTaSTST
rjaA1
a|b|c|d|e|f|...|z
a|b|c|d|e|f|...|z
a|b|c|d|...|r|...|z
car
a|b|c|d|e|f|...|z
a|b|c|d|e|f|...|z
a|b|c|d|...|r|...|z
a|b|c|d|e|f|...|zzebra
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (6/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Patterns
Question Pattern EAT
Where is Chile? ^ Where is !LOC $ LOC
Who is the dean of ICS? ^ Who is the !POS of !ORG $ NAME
Who is J. Smith? ^ Who is !NAME $ DESC
Who is J. Smith of ICS? ^ Who is !NAME of !ORG $ DESC
How far is Athens? ^ How far is !LOC $ NO
How tall is Sting? ^ How tall is !NAME $ NO
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (7/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
1
2where
6who
18how
7is
13!NAME
9!POS
10of
11!ORG
12$ (eoq)
14$ (eoq)
8the
15of
16!ORG
17$ (eoq)
3is
4!LOC
5$ (eoq)
19far
20is 21!LOC
22$ (eoq)
23tall 24is
25!NAME
26$ (eoq)
^ (boq)
Question Trie
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (8/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Nodes Information (EAT, Frequency)
1 (LOC,1),(NAME,1),(DESC,2),(NUMBER,2)
2-5 (LOC,1)
6-7 (NAME,1),(DESC,2)
8-12 (NAME,1)
13 (DESC,2)
14-17 (DESC,1)
18 (NUMBER,2)
19-26 (NUMBER,1)
1
2where
6who
18how
7is
13!NAME
9!POS
10of
11!ORG
12$ (eoq)
14$ (eoq)
8the
15of
16!ORG
17$ (eoq)
3is
4!LOC
5$ (eoq)
19far
20is 21!LOC
22$ (eoq)
23tall 24is
25!NAME
26$ (eoq)
^ (boq)
Question Trie
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (9/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
1 6who
7is
13!NAME
14$ (eoq)
15of
16!ORG
17$ (eoq)
^ (boq)
$^ who is John Smith of Macquarie University
? ?
$^ who is Madonna
?
Look-ahead process
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (10/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
JustAsk logs; 4.8% NL questions
• 60.732 of 1.275.116 were NL questions
• 47.844 unique NL questions
• 23% with some language problems:• Why this search not word?
• Unusual language:• Do u offer any scholarships 4 physiotherapy?
• Speculative questions:• Will I get a job in Australia after finishing my MBA?
MQ Questions
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (11/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
JustAsk questions were randomly selected and semi-automatically tagged according to a XML like structure
• <Q AT=‘DESC’>Who is <ENAMEX type=“NAME”>Luiz Pizzato</ENAMEX>?</Q>
Total number of questions: 1385• 233 – Who• 212 – What• 208 – Where• 203 – How• 529 – Other types:
• Am I, Are there, Can I, Do you, Is there, I want, I need, Which, Does, Tell me, Why, Have you, Could you, May I, Will I, Was I, Would you, Whom
Training Set
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (12/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
0
10
20
30
40
50
60
70
80
90
100
0 200 400 600 800 1000 1200 1400
Training set size
Pre
cis
ion
(%
)
EAT Average Trendline
Evaluation - EAT
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (13/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
0
10
20
30
40
50
60
70
80
90
100
0 200 400 600 800 1000 1200 1400
Training set size
Pe
rce
nta
ge
Recall Precision
Evaluation – Focus
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (14/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Question Trie without Entities
1
2where
6who
19how
7is
13J.
9dean
10of
11ICS
12$ (eoq)
15$ (eoq)
8the
16of
17ICS
18$ (eoq)
3is
4Chile
5$ (eoq)
20far
21is 22Athens
23$ (eoq)
24tall 25is
26Sting
27$ (eoq)
^ (boq)
14Smith
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (15/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
0 50 100 150 200 250 300 350 400
Training set size
Pre
cis
ion
(%
)
EAT Average Trendline
Evaluation – TREC-2003
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (16/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Comparison with SVM (Zhang and Lee, 2003)
60.00%
65.00%
70.00%
75.00%
80.00%
85.00%
90.00%
0 1000 2000 3000 4000 5000 6000
Size of the training set
Pre
cis
ion
(%
)
SVM - fine grained Trie - fine grained
SVM - coarse grained Trie - coarse grained
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (17/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Concluding remarks
The developed technique offers reasonable results using no linguistic resources.
Future developments Define guidelines for the EAT markup and
review the markup of the MQ questions Adding POS and semantic information from
WordNet may replace entity markup
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (18/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Combine lexical and POS informationWho is John Smith?
is VBZEAT freqNAME 1DESC 1
Who WPEAT freqNAME 1DESC 1
^ ^EAT freqNAME 1DESC 1
$ $EAT freqNAME 1
John NNPEAT freqNAME 1
Smith NNPEAT freqNAME 1
John Smith NNPEAT freqNAME 1
Further Work
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (19/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Dell Zhang and Wee Sun Lee. 2003. Question classification using support vector machines. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR-03), pages 26–32. ACM Press.
Dan Moldovan, Marius Paşca, Sanda Harabagiu, and Mihai Surdeanu. 2003. Performance issues and error analysis in an open-domain question answering system. ACM Trans. Inf. Syst., 21(2):133–154.
References
Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis. (20/21) In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Acknowledgments
My supervisorsDr. Diego Mollá-AliodDr. Rolf Schwitter Dr. Cecile Paris
ALTA Workshop’04, Macquarie University, Sydney8 December 2004
Luiz Augusto Sangoi [email protected]
http://www.ics.mq.edu.au/~pizzato
Using a Trie-based Structure for Question Analysis