promise 2011: "empirical validation of human factors on predicting issue resolution time in...

18
Empirical validation of human factors in predicting issue lead time in open source projects Nguyen Duc Anh, Daniela S. Cruzes, Claudia Ayala and Reidar Conradi 1

Upload: cs-ncstate

Post on 06-May-2015

2.697 views

Category:

Technology


2 download

DESCRIPTION

Promise 2011:"Empirical validation of human factors on predicting issue resolution time in open source projects"Anh Nguyen Duc, Daniela Cruzes, Claudia Ayala and Reidar Conradi.

TRANSCRIPT

Page 1: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

Empirical validation of human factors in

predicting issue lead time in open source

projects

Nguyen Duc Anh, Daniela S. Cruzes,Claudia Ayala and Reidar Conradi

1

Page 2: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

Outline

• Introduction

• Research questions

• Research methodology

• Results

• Conclusions

• Future work

Page 3: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

Introduction

• Software maintenance and evolution• Fixing bugs, implementing new feature requests, and

enhancing current system features• Mozilla bug tracking system receives 170 issue reports/ day,

Eclipse projects receives 120 reports/ day (Kim & Whitehead 2006)

• Issue Lead Time Prediction is challenging due to the:• Dynamics of software evolution, and• Lack of clear understanding of the factors

influencing issue lead time.

3

Page 4: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

Previous Studies on Issue Lead Time Prediction

4

• Main focus is on characteristics of the issue only.• Ex: priority, effort, number of comments.

• Little focus on the Human factors aspect:• Developer’s experience, ability, reputation• Developer’s collaboration

• Developer’s capability & collaboration in developing a software module can affect how likely they are to introduce bugs in the module Are they useful for classifying/ predicting issue lead time as

well?

Page 5: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

Previous Studies on Bug Lead Time Prediction

5

Giger et al. 2010

Bougie et al. 2007

Bhattacharya et al. 2011

Anbalagan et al. 2009

Hooimeijer et al. 2007

No of comments X X X

Reporter X X

Assignee X X

Severity X X X X

Priority X X

Operating system type

X

Open time X X

Platform X

No of attachment X X

No of dependencies

X

No of developers X X

Daily load X

Submitter reputation

X

Bug category X

Page 6: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

Research questions

• RQ1. Do human factor metrics improve classification of issue lead time?

• RQ2. Which characteristics of issues increase the predictive power of a linear regression model for predicting issue lead time?

• RQ3. What is the accuracy of classification/ prediction models achieved?

6

Page 7: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

Info.\Projects Qt Qpid GeronimoMain organization involved Qt (Nokia)

Red Hat, JP Morgan IBM

Collection time frame 85 months 51 months 87 months

Number of stakeholders 133 39 60

Number of issues 16818 3016 5697Number of selected issues 9921 2278 4787

Projects

7

Page 8: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

• Issue lead time: • Duration between creation time and resolution time• Valid issues with stakeholders assignment• RESOLVED issues

Dependent variable

8

Page 9: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

Independent variables

t0 t1 t2

No. of past reported, resolved issues,Past issues resolution time

Description length,Issue type, VersionCreation time ...

Nature of an issue

Past performance of reporter/ assignee

Collaboration in resolving issue

No. of comment,No. of stakeholders Metrics

Dimension

Past Present Near future

Issue i

predict ?

tresolved∆t

9

Page 10: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

• Stakeholder past performance • Reporter experience (ExpR)

• Assignee experience (ExpA)

• Assignee Average past issue lead time (Apit)

Independent variables

1 _ 1exp ( , ) ( ) :j created issr rep t count iss t t

1 _ 1exp ( , ) ( ) :j resolved issa dev t count iss t t

1

_ _ _ 11

1 _ _ 1

:( , )

: 1

k

ii resolved i created i resolved ii

j ki created i resolved i

t t t t t tapit dev t

t t t t tk

10

Page 11: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

• Post submission collaboration • The number of comments (NoC)

• The number of involved stakeholders (NoS)

Independent variables

_ 1 2( ) ( ) : [ , ]i comment cnoc iss count c t t t

_ 1 2( ) ( ) : ( ) : [ , ]ji j comment cnos iss count s c s t t t

11

Page 12: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

Research methodology

12

Page 13: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

# Model Qt Qpid Geronimo

1 Issue features 84.59% 58.52% 59.56 %

2Issue features + ExpR

85.53%(+0.94

%)

60.18%(+1.66%)

61.77%(+2.21%)

3Issue features+ ExpA

85.78%(+1.19

%)

60.72%(+2.2%)

62.00%(+2.44%)

4Issue features + Apit

87.46%(+2.87

%)

70.59%(+12.07

%)

62.90%(+3.34%)

5 Issue + NoC

86.56%(+1.97

%)

59.83%(+1.31%)

72.72%( +13.16

%)

6 Issue + NoS

86.77%(+2.18

%)

62.20%(+3.68%)

66.13%(+6.57%)

9 All90.58%(+5.99

%)

72.78%(+14.26

%)

73.22%(+13.66

%)

Classification resultsAccuracy of binary classification models

13

Conclusions:

1. Number of comments and average past issue lead time are effective complementary variables in classifying issue lead time.

Page 14: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

Univariate and Multivariate analysis

Variables Qt Qpid Geronimo

Description length –0.123** 0.065** 0.118**

Priority –0.157** 0.021 –0.021ExpR 0.372** 0.222** –0.113**ExpA –0.186** –0.021 –0.168**NoC 0.008* 0.243** 0.416**NoS 0.123** 0.309** 0.303**Apit 0.799** 0.284** 0.222**

Spearman correlation with issue resolution time

14

Variables Qpid Geronimo QtIntercept –17.859 –6.478** –47.130**Description length 0.004 0.003 –0.001Priority –7.549 –10.740** –53.090**ExpR 0.110** 0.045* –0.892**ExpA –0.051* –0.010 –1.432**NoC 1.617 2.710** 1.607NoS 43.038** 11.38** 20.500**Apit 0.386** 0.588** 0.837**Model R2 = 0.2922

Adjusted R2 = 0.2809

R2 = 0.3226Adjusted

R2 = 0.3196

R2 = 0.5954Adjusted R2 = 0.595

Linear regression models

Page 15: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

Conclusions• RQ1. Do human factor metrics improve classification of issue lead time?

• Yes. Accuracy improvement up to 12%

• RQ2. Which human factor metrics contribute significantly to issue lead time prediction in the linear regression models?

Project Qpid Qt Geronimo

AnalysisMulti

UniMulti

UniMulti

Uni

Reporter exp. ++ ++ -- ++ + --Assignee exp. - -- -- --Number of comments ++ + ++ ++

Number of stakeholders ++ ++ ++ ++ ++ ++

Average past resolution time

++ ++ ++ ++ ++ ++

15

Page 16: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

Conclusions• RQ3. What are the accuracy of classification/ prediction

models can be achieved?Study Dependent var. Dataset R2

Bhattacharya et al. 2009

Bug fixing time Firefox 0.401Thunderbird 0.498Seamonkey 0.366Eclipse 0.301

Anbalagan et al. 2011

Ubuntu 5.10 0.98Ubuntu 6.04 0.81

This study Issue lead time Qpid 0.292Qt 0.595Geronimo 0.326

16

Consistent with other studies, but issue report based prediction models yield far from desirable predictive power

Page 17: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

Future work

• Investigation of other input variables: mailing list & version control system comments

• Add more projects to the analysis

• Use other prediction techniques: non-linear regression

• Compare open source vs. closed source

17

Page 18: Promise 2011: "Empirical validation of human factors on predicting issue resolution time in open source projects"

Empirical validation of human factors in

predicting issue lead time in open source

projects

Nguyen Duc Anh, Daniela S. Cruzes,Claudia Ayala and Reidar Conradi

18