liangjie hong and brian d. davison computer science and engineering lehigh university bethlehem, pa...

47
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA USA

Upload: anoki

Post on 25-Feb-2016

55 views

Category:

Documents


2 download

DESCRIPTION

A Classification-based Approach to Question Answering in Discussion Boards. Liangjie Hong and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA USA. Liangjie Hong and Brian D. Davison. Outline. Motivation Problem Definition Features Experiments - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

A Classification-based Approach to Question Answering in Discussion Boards

Liangjie Hong and Brian D. DavisonComputer Science and Engineering

Lehigh UniversityBethlehem, PA USA

Page 2: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison

• Motivation• Problem Definition• Features• Experiments• Conclusion

SIGIR July 2009

Outline

Page 3: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison

How do you find answers on the Web?

SIGIR July 2009

Motivation Problem Definition Features Experiments Conclusion

Page 4: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison

Go to search engines…Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Page 5: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Go to Question Answering Portals

Page 6: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Go to discussion boards

Page 7: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Why non-trivial?

Page 8: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Why non-trivial?

Comments, news, tutorialspersonal experiences

Page 9: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Why non-trivial?

Page 10: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Why non-trivial?Hello,

I have a problem with my GUI not loading, it used to load and then i tried installing nvidia-glx from synpatics and then rebooted it removed a load of files but for some reason it wont go back to the gui

so i remove nvidia-glx and installed nvidia-glx-180 with envyng -t

when i do startx it goes to a black screen

prior to this i couldnt even get to install glx-180 as it said missing files in the modules and everything

but now i got past that, installed the driver, but still i cant get into the gui

xserver is installed too as well as core

Page 11: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Why non-trivial?Hello,

I have a problem with my GUI not loading, it used to load and then i tried installing nvidia-glx from synpatics and then rebooted it removed a load of files but for some reason it wont go back to the gui

so i remove nvidia-glx and installed nvidia-glx-180 with envyng -t

when i do startx it goes to a black screen

prior to this i couldnt even get to install glx-180 as it said missing files in the modules and everything

but now i got past that, installed the driver, but still i cant get into the gui

xserver is installed too as well as core

No punctuationSpelling errorsMixed content

Page 12: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

How does Google deal with forums?

Page 13: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Page 14: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Page 15: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

How about

Page 16: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Page 17: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Not an answer

Page 18: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Not an answer

Not an answer

Not an answer

Not an answer

Page 19: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

What are “Questions” ?

Page 20: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

What are “Questions” ?

• a sentence• a paragraph• several paragraphs• a post• …

Page 21: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

What are “Questions” ?

• a sentence• a paragraph• several paragraphs• a post• …

Page 22: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

What are “Answers” ?

Page 23: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

What are “Answers” ?Hello,

My problem is I can't find documentation on setting up a shared folder purely through the terminal. I know how to setup shared folders when there is a Gnome desktop on Ubuntu but my searches on this forum and Google haven't come up with instructions on how to do this purely through terminal commands.

How do I add permissions to these folders for each user again purely through terminal commands?I plan on using Hardy since it's supported to 2013, but are there advantages to using Jaunty server instead that I'm not aware off?

If someone can point me in the right direction on where I can find the information to set this up, it would be greatly appreciated.

Thanks in advance for your help!

Page 24: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

What are “Answers” ?Hello,

My problem is I can't find documentation on setting up a shared folder purely through the terminal. I know how to setup shared folders when there is a Gnome desktop on Ubuntu but my searches on this forum and Google haven't come up with instructions on how to do this purely through terminal commands.

How do I add permissions to these folders for each user again purely through terminal commands?I plan on using Hardy since it's supported to 2013, but are there advantages to using Jaunty server instead that I'm not aware off?

If someone can point me in the right direction on where I can find the information to set this up, it would be greatly appreciated.

Thanks in advance for your help!

if you are using a ubuntu server and accessing it through windows workstations, you can simply install SAMBA, and then edit the smb.conf (the configuration file) through the terminal to set up file shares and permissions etc.

as for users, you can simply create samba users and passwords (which could match your xp logins for simplicity)

Page 25: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

What are “Answers” ?Hello,

My problem is I can't find documentation on setting up a shared folder purely through the terminal. I know how to setup shared folders when there is a Gnome desktop on Ubuntu but my searches on this forum and Google haven't come up with instructions on how to do this purely through terminal commands.

How do I add permissions to these folders for each user again purely through terminal commands?I plan on using Hardy since it's supported to 2013, but are there advantages to using Jaunty server instead that I'm not aware off?

If someone can point me in the right direction on where I can find the information to set this up, it would be greatly appreciated.

Thanks in advance for your help!

if you are using a ubuntu server and accessing it through windows workstations, you can simply install SAMBA, and then edit the smb.conf (the configuration file) through the terminal to set up file shares and permissions etc.

as for users, you can simply create samba users and passwords (which could match your xp logins for simplicity)

Hello renzokuken..thanks for the quick response!

So I could just install my server and setup my user id so I can login and administer the server. I don't need to setup userid's for each employee on the server. Instead I create samba users (which could be the userid of their windows PC's). I didn't know you could do this? I thought each user who used the share needed to have an Ubuntu login id?

Page 26: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

What are “Answers” ?Hello,

My problem is I can't find documentation on setting up a shared folder purely through the terminal. I know how to setup shared folders when there is a Gnome desktop on Ubuntu but my searches on this forum and Google haven't come up with instructions on how to do this purely through terminal commands.

How do I add permissions to these folders for each user again purely through terminal commands?I plan on using Hardy since it's supported to 2013, but are there advantages to using Jaunty server instead that I'm not aware off?

If someone can point me in the right direction on where I can find the information to set this up, it would be greatly appreciated.

Thanks in advance for your help!

if you are using a ubuntu server and accessing it through windows workstations, you can simply install SAMBA, and then edit the smb.conf (the configuration file) through the terminal to set up file shares and permissions etc.

as for users, you can simply create samba users and passwords (which could match your xp logins for simplicity)

Hello renzokuken..thanks for the quick response!

So I could just install my server and setup my user id so I can login and administer the server. I don't need to setup userid's for each employee on the server. Instead I create samba users (which could be the userid of their windows PC's). I didn't know you could do this? I thought each user who used the share needed to have an Ubuntu login id?

A couple of things to note. One is that Samba has a way to handle individual user (home dir) shares. The share is called [homes] and it is for all users. The second thing is where you are creating your mount point. Although you can create the mount point anywhere in the file system, I find that it makes more sense to keep the Samba shares under its own mount point. I use /smb

Page 27: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

What are “Answers” ?Hello,

My problem is I can't find documentation on setting up a shared folder purely through the terminal. I know how to setup shared folders when there is a Gnome desktop on Ubuntu but my searches on this forum and Google haven't come up with instructions on how to do this purely through terminal commands.

How do I add permissions to these folders for each user again purely through terminal commands?I plan on using Hardy since it's supported to 2013, but are there advantages to using Jaunty server instead that I'm not aware off?

If someone can point me in the right direction on where I can find the information to set this up, it would be greatly appreciated.

Thanks in advance for your help!

Hello renzokuken..thanks for the quick response!

So I could just install my server and setup my user id so I can login and administer the server. I don't need to setup userid's for each employee on the server. Instead I create samba users (which could be the userid of their windows PC's). I didn't know you could do this? I thought each user who used the share needed to have an Ubuntu login id?

A couple of things to note. One is that Samba has a way to handle individual user (home dir) shares. The share is called [homes] and it is for all users. The second thing is where you are creating your mount point. Although you can create the mount point anywhere in the file system, I find that it makes more sense to keep the Samba shares under its own mount point. I use /smbif you are using a ubuntu server and accessing it through

windows workstations, you can simply install SAMBA, and then edit the smb.conf (the configuration file) through the terminal to set up file shares and permissions etc.

as for users, you can simply create samba users and passwords (which could match your xp logins for simplicity)

Page 28: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

What are “Answers” ?Hello,

My problem is I can't find documentation on setting up a shared folder purely through the terminal. I know how to setup shared folders when there is a Gnome desktop on Ubuntu but my searches on this forum and Google haven't come up with instructions on how to do this purely through terminal commands.

How do I add permissions to these folders for each user again purely through terminal commands?I plan on using Hardy since it's supported to 2013, but are there advantages to using Jaunty server instead that I'm not aware off?

If someone can point me in the right direction on where I can find the information to set this up, it would be greatly appreciated.

Thanks in advance for your help!

Hello renzokuken..thanks for the quick response!

So I could just install my server and setup my user id so I can login and administer the server. I don't need to setup userid's for each employee on the server. Instead I create samba users (which could be the userid of their windows PC's). I didn't know you could do this? I thought each user who used the share needed to have an Ubuntu login id?

A couple of things to note. One is that Samba has a way to handle individual user (home dir) shares. The share is called [homes] and it is for all users. The second thing is where you are creating your mount point. Although you can create the mount point anywhere in the file system, I find that it makes more sense to keep the Samba shares under its own mount point. I use /smbif you are using a ubuntu server and accessing it through

windows workstations, you can simply install SAMBA, and then edit the smb.conf (the configuration file) through the terminal to set up file shares and permissions etc.

as for users, you can simply create samba users and passwords (which could match your xp logins for simplicity)

Answer

Page 29: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Questions & Answers Textual Mismatch ?

Can any one help me load ubuntu 8.10 on to my pc? I have a asus AS V3-P5V900 but when i load from cd it keeps crashing, i think i does not reconise the graphics card. When i boot from cd it asks me what language ENGLISH then when try to load it crash again i have tried help and put in via=771 any help please?

You might try using the “alternate” install CD:http://www.ubuntu.com/getubuntu/downloadmirrors#alternate

Page 30: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

A Summary• Two sub-tasks:

Question detectionAnswer detection

• Our method:As a classification problem (not retrieval)

• Explore simple/combination features vs. NLP• Better performance on two real world datasets

Page 31: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

A Summary• Two sub-tasks:

Question detectionAnswer detection

• Our method:As a classification problem (not retrieval)

• Explore simple/combination features vs. NLP• Better performance on two real world datasets

The only page you need to remember

Page 32: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

For questions• Question mark (1 feature)• 5W1H words (6)

why, what, where, which, when, how• Thread length (1)

total number of posts• Authorship (1)

#first post/#total posts• N-gram (1000-3000)

Carvalho et al. Improving email speech acts analysis via n-gram selection. HLT/NAACL 2006.

Page 33: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

For answers• Post position (2)• Authorship (1)• N-gram (1000-3000)• Stopwords (571)• Query Likelihood Model (Language Model) (1)

Page 34: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Datasets• PhotographyOnTheNet (http://www.photography-on-the.net/)

721, 442 threads• UbuntuForums (http://www.ubuntuforums.org)

555, 954 threads

• Sampled approximately 500 threadsfor each sub-task and dataset

Page 35: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Classification Method• Manually labeled

questions vs. non-questionsone best answer per thread

• 10-fold cross validation• libSVM for classification• Measured performance by Precision, Recall,

F-Measure, Accuracy

Page 36: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Comparisons to existing methods• For questions

Part-Of-Speech taggingStanford Log-linear POS tagger

Sequential Pattern Mining

• For answersGraph-based model incorporated with inter-posts relevance, authorship and similarity

Cong et. al. Finding question-answer pairs from online forums. SIGIR 2008.

Page 37: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Question DetectionSingle Feature (UbuntuForums)

QM 5W1H Length Author SPM N-gram0

0.10.20.30.40.50.60.70.80.9

1

0.641000000000002

0.679000000000003

0.707000000000001

0.712000000000001

0.754000000000002

0.833000000000001

Page 38: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

5W1H Length QM SPM Author N-gram0

0.10.20.30.40.50.60.70.80.9

1

0.500

0.636 0.664 0.6710.755 0.775

Question DetectionSingle Feature (Photography)

Page 39: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

A+LEN Q+5+LEN A+Q+5+LEN SPM N-gram0

0.10.20.30.40.50.60.70.80.9

1

0.716000000000001

0.722000000000001

0.746000000000002

0.754000000000002

0.833000000000001

Question DetectionCombined Feature (UbuntuForums)

Page 40: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

SPM Q+5+LEN N-gram A+LEN A+Q+5+LEN0

0.10.20.30.40.50.60.70.80.9

1

0.671000000000002

0.711000000000001

0.775000000000002

0.843000000000001

0.876000000000002

Question DetectionCombined Feature (Photography)

Page 41: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Answer DetectionSingle Feature (Ubuntu)

GQL Stop NG LM POSI AUTH0

0.10.20.30.40.50.60.70.80.9

1

0.620 0.640 0.663 0.6820.737 0.765

Page 42: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Answer DetectionSingle Feature (Photography)

GQL LM NG Stop AUTH POSI0

0.10.20.30.40.50.60.70.80.9

1

0.591

0.659000000000002

0.706000000000001

0.712000000000001

0.735000000000001

0.827000000000001

Page 43: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Answer DetectionCombined Feature (UbuntuForums)

Stop+NG LM+Stop LM+POSI LM+A POSI+Stop LM+POSI+A POSI+A0

0.10.20.30.40.50.60.70.80.9

1

0.760 0.761 0.770 0.786 0.798

0.946 0.952

Page 44: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Answer DetectionCombined Feature (Photography)

Stop+NG LM+A LM+Stop LM+POSI POSI+Stop LM+POSI+A POSI+A0

0.10.20.30.40.50.60.70.80.9

1

0.712 0.734 0.7400.827

0.8730.970 0.975

Page 45: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Summary• Question Detection

N-gramAuthorship+Question Mark+5W1H+Length

• Answer DetectionPosition, AuthorshipPosition+AuthorshipLanguage Model+Position+Authorship

Page 46: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Summary• Question Detection

N-gramAuthorship+Question Mark+5W1H+Length

• Answer DetectionPosition, AuthorshipPosition+AuthorshipLanguage Model+Position+Authorship

This is only a starting point!

Page 47: Liangjie Hong  and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA  USA

Liangjie Hong and Brian D. Davison Motivation Problem Definition Features Experiments Conclusion

SIGIR July 2009

Thank you!Questions?

Contact Info:Liangjie [email protected] LaboratoryComputer Science and Engr.Lehigh UniversityBethlehem, PA 18015 USA