community and thread methods for identifying best answers in online question answering communities

GRÉGOIRE BUREL

Knowledge Media Institute, The Open University, Milton Keynes, UK. Knowledge Media Institute, The Open University 17th November 2015

Community and thread methods for

identifying best answers in online question

answering communities

Q&A communities are communities composed of askers and answerers looking for solutions to particular issues. When looking for answers, users need to identify if similar questions have already been answered correctly and see if a best answer exists. Unfortunately not all questions have labelled best answers (43.2% of questions do not have labelled best answers). Existing works have mostly focused on quality answer identification rather than best answer identification. They have also generally ignored community studies about what makes best answers and the structure of Q&A websites.

Q&A Communities�

Community And Thread Methods For Identifying Best Answers In Online Question Answering Communities

Publications

Chapter 1

Question

Answer #1

Answer #2

...

Answer #n

Que

stio

n Th

read

?

!

-

2


Q&A Communities�


Publications

Chapter 1

Question

Answer #1

Answer #2

...

Answer #n

Que

stio

n Th

read

?

!

-

3


Q&A Communities�


Publications

Chapter 1

Question

Answer #1

Answer #2

...

Answer #n

Que

stio

n Th

read

?

!

-

4

Identifying Best Answers using User, Content and Thread Features


Chapter 4

5

In order to identify best answers features are extracted and associated with each answers and a binary classifier trained: -  Such features are divided into User, Content

and Thread features. -  31 features are used for the baseline model.

Results -  The baseline model achieve an F1 of 0.817

on average. -  Thread features in particular score ratios are

highly related to best answers.

What methods could be used for improving such model?

?

Identifying Best Answers using User, Content and Thread Features


Chapter 4

6

In order to identify best answers features are extracted and associated with each answers and a binary classifier trained: -  Such features are divided into User, Content

and Thread features. -  31 features are used for the baseline model.

Results -  The baseline model achieve an F1 of 0.817

on average. -  Thread features in particular score ratios are

highly related to best answers.

What methods could be used for improving such model?

?

Qualitative Design

Identify best answer predictors from community

surveys.

The Qualitative and Structural Method�


Hypothesis (H1.2)

Beliefs about what make quality answers can be used for identifying and designing features that

correlate with best answers.

Chapter 1

Structural Design

Analyse community structure for optimising best

answer identification models.

Research Question

Can structural and qualitative design

improve the

performance of

automatic identification

of best answers?

Hypothesis (H1.1)

The thread-like structure of Q&A communities can

help the automatic identification of best

answers. 7

Research Questions and Evaluation


Chapter 1

8

Research Question 1 Can structural and

qualitative design

improve the performance of automatic identification of best answers?

RQ1

Research Question 1.1 Can structural optimisation techniques improve automatic best answer identification?

RQ1.1

Research Question 1.2 How do user beliefs about what makes quality answers compare to the other features that identify best answers?

RQ1.2

Identify Complexity/Maturity and Effort as potential predictors.

Identify threads-wise normalisation

and LTR as potential optimisations.

Stru

ctur

al

Qu

alita

tive



Chapter 1

9


qualitative design


RQ1


RQ1.1


RQ1.2

E1.1

RQ1.1 Evaluation Check if structural methods

improve best answer

identifications.




Stru

ctur

al

Qu

alita

tive



Chapter 1

10


qualitative design


RQ1


RQ1.1


RQ1.2 RQ1.3

RQ1.4

Research Question 1.3 Can question complexity and maturity be used for measuring the ability of users to learn and being

knowledgeable ?

Research Question 1.4 Can contribution effort be used for modelling the reactivity of community users in contributing particular answers?

E1.1


improve best answer

identifications.




Stru

ctur

al

Qu

alita

tive



Chapter 1

11


qualitative design


RQ1


RQ1.1


RQ1.2 RQ1.3

RQ1.4


knowledgeable ?


E1.1

E1/

E1.2


improve best answer

identifications.

RQ1/RQ1.2 Evaluation Check if qualitative features (i.e. complexity, maturity and effort) improve best answer identifications.




Stru

ctur

al

Qu

alita

tive

Methodology �


Extraction Extract features and annotate data for building models.

2

1

Modelling Train machine learning models for performing predictions.

2

Model Evaluation Evaluate models and analyse features importance.

3

Hypothesis Evaluation Test research hypotheses.

4

User, Content and Thread Features + Normalisation

User, Content and Thread Features Complexity Annotations Stanines Extended Features

Binary Classifier LTR

Logistic Regression Omega Metric JET/AJET STAN/ASTAN Supervised Classifier

Models Comparison Features Comparison

Models Comparison Features Comparison Features Analysis

Structural Models Vs. Standard Models

Reputation Vs. Maturity Effort Vs. Reactivity Qualitative Design Features Vs. Others

Stru

ctur

al

Qua

litat

ive

Chapter 1

12

Methodology �


Extraction Extract features and annotate data for building models.

2

1

Modelling Train machine learning models for performing predictions.

2

Model Evaluation Evaluate models and analyse features importance.

3

Hypothesis Evaluation Test research hypotheses.

4

User, Content and Thread Features + Normalisation

User, Content and Thread Features Complexity Annotations Stanines Extended Features

Binary Classifier LTR

Logistic Regression Omega Metric JET/AJET STAN/ASTAN Supervised Classifier

Models Comparison Features Comparison

Models Comparison Features Comparison Features Analysis

Structural Models Vs. Standard Models

Reputation Vs. Maturity Effort Vs. Reactivity Qualitative Design Features Vs. Others

Stru

ctur

al

Qua

litat

ive

Chapter 1

13


14

RQ1


RQ1.2 RQ1.3

RQ1.4

E1/

E1.2


improve best answer

identifications. Identify threads-wise normalisation


Stru

ctur

al

Structural Design�

RQ1.1

Chapter 5

RQ1.1 E1.1

Structural Design�


Chapter 5

?

!

Research Question (RQ1.1) Can structural optimisation techniques improve automatic best answer identification and if so how ?

Hypothesis (H1.1) Structural optimisation techniques that take into account the tread-like structure of Q&A communities can help the automatic identification of best answers.

Approach -  Thread-wise normalisation approaches (proportional / order / normalised

order). -  Learning To Rank Models (LTR). Results -  Both LTR (+5.2% F1) and normalisation improve results (+5.3% F1)

significantly compared to non normalised models. -  Relational normalisation increase the importance of content features (e.g.

term entropy).

-  Grégoire Burel, Yulan He and HarithAlani (2012). Automatic identification of best answers in online enquiry communities. In: 9th Extended Semantic Web Conference (ESWC ’12), 27-31 May 2012, Crete, Greece.

-  Grégoire Burel, Yulan He, Paul Mulholland and Harith Alani (2015). Modelling Question Selection Behaviour in Online Communities. In: Companion Proceedings of the 2015 International Conference on the World Wide Web (WWW ’15), 18-22 May 2015, Florence, Italy.

-  Grégoire Burel, Paul Mulholland, Yulan He and Harith Alani (2015). Predicting Answering Behaviour in Online Question Answering Communities. In: 26th Conference on Hypertext and Social Media (HT ’15), 1-4 September 2015, Cyprus.

15

RQ1.1

RQ1.3

RQ1.4


Chapter 2

16

RQ1 RQ1.1


E1.1

E1/

E1.2


Qu

alita

tive

RQ1.2

Qualitative Design – Features Identification

RQ1.2

Qualitative Design – Features Identification


?

!

Research Question (RQ1.2) How do user beliefs about what makes quality answers compare to the other features that identify best answers?

Hypothesis (H1.2) Community contributors’ belief about what makes quality answers can be used for identifying and designing features that correlate with best answers.

Approach -  Exploratory survey of community managers (191 users) in two communities

(SCN and IBM Connections) for understanding user needs.

Results -  Quality answers are associated with knowledgeable users (i.e. mature users)

and user reactivity (i.e. contribution effort).

Chapter 2

Matthew Rowe, Harith Alani, Sofia Angeletou, and Grégoire Burel. Report on Social, Technical and Corporate Needs in Online Communities. Technical Report 3.1, ROBUST, 2011.

17

RQ1.2



Chapter 6

18

RQ1 RQ1.1

RQ1.2 RQ1.3


knowledgeable ?

E1.1

E1/

E1.2




Qualitative Features RQ1.3

RQ1.4

RQ1.4

Qualitative Feature #1 - Question Complexity and Maturity


Chapter 6

Grégoire Burel and Yulan He. 2013. A question of complexity: measuring the maturity of online enquiry communities. In: 24th ACM Conference on Hypertext and Social Media (HT ’13), 1-3 May 2013, Paris, France.

?

!

Research Question (RQ1.3) Can question complexity and contributor maturity be used for measuring the ability of users to learn new things and being knowledgeable and if so how ?

Hypotheses (H1.3) Knowledgeable users are more likely to answer or ask complex questions.

Approach -  Consider that question complexity is related to five different variables: 1)

Temporality; 2) Enquiry; 3) Commitment; 4) Accomplishment, and; 5) Focus. -  Consider that mature users contribute more complex questions compared to

others. -  Annotations of 220 question pairs (complex/not complex). -  Regression model for identifying complex question and the derivation of a

complexity metric (Omega).

Results -  Measuring question complexity automatically is complex (0.65 F1). -  Users mature overtime and user maturity can be used as a proxy measure of

knowledge.

19

RQ1.3

Qualitative Feature #2 - Contribution Effort


Chapter 7

Grégoire Burel and Yulan He. 2014. Quantising Contribution Effort in Online Communities. In: Companion Proceedings of the 2014 International Conference on the World Wide Web (WWW ’14), 7-11 April 2014, Seoul, Korea.

?

!

Research Question (RQ1.4) Can contribution effort be used for modelling the reactivity of community users in contributing particular answers and if so how?

Hypotheses (H1.4) User reactivity can be estimated from the amount of effort required for generating the words that form an answer.

Approach -  Consider that effort can be measured based on user vocabulary usage. -  Use stanines and topic models for measuring contribution effort. -  Evaluate the models by testing different hypotheses (activity levels. time-to-

response and term preference).

Results -  Effort can be measured using stanines. -  Effort can be used as a proxy measure of community reactivity. -  Topic models are slow to compute and stanine based mode may be preferred

when computation time is an issue.

20

RQ1.4


Chapter 8

21

RQ1 RQ1.1

RQ1.2 RQ1.3

RQ1.4

E1.1

E1/

E1.2

RQ1/RQ1.2 Evaluation Check if qualitative features (i.e. complexity, maturity and effort) improve best answer identifications.



Best Answers Identification with Structural and Qualitative Design RQ1 RQ1.2

Best Answers Identification with Structural and Qualitative Design


Chapter 8

? Research Question (RQ1) Can structural and qualitative design improve the performance of automatic identification of best answers in online Q&A communities, and if so how?

Approach -  Integrate qualitative design features and thread-wise normalisation methods

into best answers identification models. -  Minimise the number of predictor while maximising F1 using IGR.

Results -  Structural methods improved best answer identification (Chapter 5). -  Qualitative design features did not improve the results significantly but they

are highly ranked (e.g. contributions with low effort and users that answer complex questions are more likely to provide best answers)

-  Score features where overwhelming important. -  Top 3 predictors are: 1) Score; 2) Score ratio; 3) No. of Comments.

22

RQ1 RQ1.2

Lessons Learned�


Structural Approaches

Compared to the baseline models, structural approaches improve automatic best answer identification by around +5% F1.

Using both LTR and Thread-wise normalisation does not improve results compared to each methods separately. Thread-normalisation change the importance of features (e.g. content length) Structural methods may be used successfully for other classification tasks where analysed communities are highly structured.

Chapter 9

23

Lessons Learned�




Qualitative Design Features

Qualitative features are correlated with best answers. The Omega (Ω) metric can be used for measuring question complexity and mature users. Contribution effort can be measured using different methods (STAN/ASTAN/JET/AJET) and can be used as a proxy measure of user reactivity. Effort metrics and complexity metrics may be useful in other contexts (e.g. locating challenging questions, identifying reactive user). Qualitative methods may be used in other tasks where features need to be designed.


Chapter 9

24

Lessons Learned�




Qualitative Design Features

Qualitative features are correlated with best answers. The Omega (Ω) metric can be used for measuring question complexity and mature users. Contribution effort can be measured using different methods (STAN/ASTAN/JET/AJET) and can be used as a proxy measure of user reactivity. Effort metrics and complexity metrics may be useful in other contexts (e.g. locating challenging questions, identifying reactive user). Qualitative methods may be used in other tasks where features need to be designed.


Chapter 9

Best Answers Identification

Qualitative and Structural approaches help the improvement of baseline models. Meaningful features help the understanding of what are the components of best answers. Score features alone provide extremely good results but may not be usable in a real world setting. Bag of word features may improve substantially the identification of best answers. Best answers are associated with high scores, high amount of comments, lexical complexity, length and answering effort. 25

Now and Then


Now Identifying Questions to Answer. Large Scale Best Answer Identification.

!

Then Predicting community ratings.

?

26

Questions and Discussion

@ Email: [email protected] Twitter: @evhart


27

community and thread methods for identifying best answers in online question answering communities

Data & Analytics