question-answering on yahoo!answers: preliminary results

22
Question-Answering on Question-Answering on Yahoo!Answers: Preliminary Yahoo!Answers: Preliminary Results Results Rong Tang Rong Tang Sheila Denn Sheila Denn OCLC/ALISE LIS Research Grant Presentation OCLC/ALISE LIS Research Grant Presentation ALISE 2009 ALISE 2009 January 23, 2009 January 23, 2009

Upload: treva

Post on 20-Jan-2016

94 views

Category:

Documents


0 download

DESCRIPTION

Question-Answering on Yahoo!Answers: Preliminary Results. Rong Tang Sheila Denn OCLC/ALISE LIS Research Grant Presentation ALISE 2009 January 23, 2009. Background. Yahoo!Answers Social Q&A 25+ pre-defined categories Users post questions, answer questions, rate answers, provide comments - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Question-Answering on Yahoo!Answers: Preliminary Results

Question-Answering on Question-Answering on Yahoo!Answers: Yahoo!Answers:

Preliminary ResultsPreliminary ResultsRong TangRong Tang

Sheila DennSheila DennOCLC/ALISE LIS Research Grant PresentationOCLC/ALISE LIS Research Grant Presentation

ALISE 2009ALISE 2009January 23, 2009January 23, 2009

Page 2: Question-Answering on Yahoo!Answers: Preliminary Results

BackgroundBackgroundYahoo!AnswersYahoo!Answers

Social Q&ASocial Q&A

25+ pre-defined categories25+ pre-defined categories

Users post questions, answer questions, Users post questions, answer questions, rate answers, provide commentsrate answers, provide comments

One best answer chosen by the asker or One best answer chosen by the asker or through votethrough vote

Users may provide commentsUsers may provide comments

Page 3: Question-Answering on Yahoo!Answers: Preliminary Results
Page 4: Question-Answering on Yahoo!Answers: Preliminary Results
Page 5: Question-Answering on Yahoo!Answers: Preliminary Results

Rating/Voting/Rating/Voting/CommentingCommenting

Page 6: Question-Answering on Yahoo!Answers: Preliminary Results

Our Research Our Research ProjectProject

Funded by OCLC/ALISE Grant Program and Funded by OCLC/ALISE Grant Program and Simmons College President’s Fund for Simmons College President’s Fund for ResearchResearch

Project Staff:Project Staff:Rong Tang (PI)Rong Tang (PI)Sheila Denn (Co-PI)Sheila Denn (Co-PI)Sam Kalat (technology consultant, programmer)Sam Kalat (technology consultant, programmer)Laura Saunders (Research Assistant)Laura Saunders (Research Assistant)

The The project wiki page documents the relevant documents the relevant literature and project progression, with literature and project progression, with extensive meeting notes on coding decisionsextensive meeting notes on coding decisions

Page 7: Question-Answering on Yahoo!Answers: Preliminary Results

Research Research QuestionsQuestions

Are existing question taxonomies (such as Are existing question taxonomies (such as those in Graesser et al. (1994) and Freed those in Graesser et al. (1994) and Freed (1994)) valid in a social Q&A environment?(1994)) valid in a social Q&A environment?

What are the relationships between the What are the relationships between the linguistic characteristics, functional properties, linguistic characteristics, functional properties, and subject content of the questions and the and subject content of the questions and the kinds of responses that they receive?kinds of responses that they receive?

What are the characteristics of answers that are What are the characteristics of answers that are chosen as “best” answers?chosen as “best” answers?

What is the role of the social function vs. the What is the role of the social function vs. the information function in social Q&A?information function in social Q&A?

What are the implications of the above for What are the implications of the above for provision of library and information services?provision of library and information services?

Page 8: Question-Answering on Yahoo!Answers: Preliminary Results

Previous Previous ResearchResearch

Question classificationQuestion classificationWh- questions (Robinson & Rackstraw, 1972)Wh- questions (Robinson & Rackstraw, 1972)Conceptual question categories (Lehnert, 1978)Conceptual question categories (Lehnert, 1978)Content-based question categories (Graesser, et al., Content-based question categories (Graesser, et al., 1994)1994)Reference question classification (Pomerantz, 2005)Reference question classification (Pomerantz, 2005)Questions in Dynamic Semantics (Aloni, Butler, & Questions in Dynamic Semantics (Aloni, Butler, & Dekker, 2007)Dekker, 2007)

Answer classificationAnswer classificationMuch less research here than with question Much less research here than with question classificationclassification

Answer selection rules (Lehnert, 1978)Answer selection rules (Lehnert, 1978)

Criteria based on Yahoo!Answers comments (Kim et al., Criteria based on Yahoo!Answers comments (Kim et al., 2007)2007)

Page 9: Question-Answering on Yahoo!Answers: Preliminary Results

Previous Previous Research Research (cont.)(cont.)

Formal studies of Online Q&AFormal studies of Online Q&AAnswerers: “specialists” vs. “synthesists” Answerers: “specialists” vs. “synthesists” (Gazan, 2006)(Gazan, 2006)

Questioners: “seekers” vs. “sloths” (Gazan, Questioners: “seekers” vs. “sloths” (Gazan, 2007)2007)

Question purpose (Graesser, et al., 1994)Question purpose (Graesser, et al., 1994)Filling knowledge gapsFilling knowledge gaps

Establishing and monitoring common groundEstablishing and monitoring common ground

Coordinating social actionCoordinating social action

Directing the conversation and controlling Directing the conversation and controlling attention attention

Page 10: Question-Answering on Yahoo!Answers: Preliminary Results

Research PlanResearch PlanData collection and samplingData collection and sampling

Gathered a stratified random sample of Gathered a stratified random sample of 3,000 question-answer sets, including 3,000 question-answer sets, including any commentsany commentsStratified by 25 top-level categories Stratified by 25 top-level categories assigned by Yahoo!Answersassigned by Yahoo!Answers

Data codingData codingContent analysis at multiple levelsContent analysis at multiple levels

SyntacticSyntacticSemanticSemanticPragmaticPragmatic

Page 11: Question-Answering on Yahoo!Answers: Preliminary Results

Research Plan Research Plan (cont.)(cont.)

Data AnalysisData AnalysisDescriptive statistics will be produced for:Descriptive statistics will be produced for:

Frequency of answers provided per questionFrequency of answers provided per questionAverage length of time to first answerAverage length of time to first answerDistribution of subject categories Distribution of subject categories Distribution of question and answer typesDistribution of question and answer typesDistribution of chosen answer typesDistribution of chosen answer types

Correlation analysis will be performed for:Correlation analysis will be performed for:Linguistic characteristics of questions and Linguistic characteristics of questions and answersanswersFunctional categories of questions and answersFunctional categories of questions and answersSubject categories of questions and answersSubject categories of questions and answers

Page 12: Question-Answering on Yahoo!Answers: Preliminary Results

Progress to DateProgress to DateSample has been collectedSample has been collected

Preliminary coding has begunPreliminary coding has begunSyntactic coding of questions is completeSyntactic coding of questions is complete

Wh- questionsWh- questionsInversion questionsInversion questionsOther questionsOther questionsMultiparts Multiparts Double codingDouble coding

Syntactic coding of question descriptions Syntactic coding of question descriptions is completeis complete

Number of questions included in description Number of questions included in description texttextType of questionsType of questions

Page 13: Question-Answering on Yahoo!Answers: Preliminary Results

Data CodingData CodingTwo coders perform coding individually then go Two coders perform coding individually then go over the coding to reach consensus on final over the coding to reach consensus on final coding of each question coding of each question

Use of informal language presents a challenge for Use of informal language presents a challenge for codingcoding

Is it a question if it doesn’t include a question mark? Is it Is it a question if it doesn’t include a question mark? Is it a question simply because it has a question mark in the a question simply because it has a question mark in the end?end?Should “WTF” be coded a “what” question or other Should “WTF” be coded a “what” question or other question? Or not at all?question? Or not at all?Coding multiparts of a question, eg., “Why do husbands Coding multiparts of a question, eg., “Why do husbands feel they have to lie to other women about being feel they have to lie to other women about being married, and when the other woman finds out?”married, and when the other woman finds out?”Double coding questions such as "Is there anywhere you Double coding questions such as "Is there anywhere you can listen to citizen band radio online?" can listen to citizen band radio online?"

Page 14: Question-Answering on Yahoo!Answers: Preliminary Results

Preliminary Preliminary ResultsResults

Page 15: Question-Answering on Yahoo!Answers: Preliminary Results

Number of Answers Number of Answers Per Question Per Question

Average Number of Answers per Question by Category

8.2

7.86

7.14

6.98

6.92

6.72

6.46

6.37

6.28

6.18

6.08

5.79

5.51

4.78

3.84

3.76

3.68

3.68

3.65

3.61

3.28

3.15

2.98

2.89

2.63

0 1 2 3 4 5 6 7 8 9

pregancyparentingdiningout

politicsgovbeautystyle

socialscience

environmentfamilyrelationships

pets

societyculturefooddrink

newsevents

sportsentertainmentmusic

artshumanities

healtheducationreference

homegarden

travelgamrecreation

carstransportation

consumerelectronicsciencemath

computerinternet

businessfinancelocalbusiness

Page 16: Question-Answering on Yahoo!Answers: Preliminary Results

Length to Receive Length to Receive 11st st Answer Answer

Average length (min.) to receive first answer

10.8

41.78

59.86

74.83

87.4

90.52

157.47

163.28

163.67

171.22

182.07

197.31

277.68

286.07

302.37

326.75

346.44

370.91

402.5

411

463

485.2

635

660.77

1635.04

0 200 400 600 800 1000 1200 1400 1600 1800

familyrelationships

pregancyparenting

fooddrink

beautystyle

socialscience

homegarden

sciencemath

health

newsevents

societyculture

artshumanities

environment

sports

pets

politicsgov

carstransportation

diningout

computerinternet

educationreference

travel

consumerelectronic

gamrecreation

businessfinance

entertainmentmusic

localbusiness

Page 17: Question-Answering on Yahoo!Answers: Preliminary Results

Wh-question Wh-question frequencyfrequency

““What” QuestionsWhat” Questions

Average Number of What Questions By Category

0

10

20

30

40

50

Page 18: Question-Answering on Yahoo!Answers: Preliminary Results

Wh-question Wh-question frequencyfrequency““Why” QuestionsWhy” Questions

Average Number of Why Questions by Category

0

2

4

6

8

10

12

14

16

18

homega

rden

localb

usine

ss

beau

tystyl

e

cons

umer

electr

onic

dinin

gout

enter

tainm

entm

usic

food

drink

trave

l

busin

essfi

nance

educ

ationre

fere

nce

healt

h

com

pute

rinte

rnet

gamre

crea

tion

cars

trans

porta

tion

preg

ancy

pare

nting

artsh

uman

ities

fam

ilyre

lationsh

ips

pets

scien

cem

ath

spor

ts

envir

onm

ent

politi

csgo

v

newse

vents

socia

lscien

ce

socie

tycu

lture

Page 19: Question-Answering on Yahoo!Answers: Preliminary Results

Wh-question Wh-question frequencyfrequency““How” QuestionsHow” Questions

Average Number of How Questions by Category

0

5

10

15

20

25

30

35

40

Page 20: Question-Answering on Yahoo!Answers: Preliminary Results

Wh-question Wh-question frequencyfrequency

““Inversion” QuestionsInversion” Questions

Average Number of Inversion Question By Category

0

10

20

30

40

50

60

Page 21: Question-Answering on Yahoo!Answers: Preliminary Results

Next StepsNext StepsStart semantic and pragmatic analysis Start semantic and pragmatic analysis of questionsof questions

Start answer analysisStart answer analysis

Start comment codingStart comment coding

Explore the association and features of Explore the association and features of Q and A and CQ and A and C

Develop a conceptual and analytical Develop a conceptual and analytical model for social Q&Amodel for social Q&A

Page 22: Question-Answering on Yahoo!Answers: Preliminary Results

Questions?Questions?