when do software issues get reported in large open source software - rakesh rana

18
Dr. Rakesh Rana, Research Scientist, Lero – The Irish Software Research Centre, Ireland When Do Software Issues and Bugs get Reported in Large Open Source Software Project?

Upload: iwsm-mensura

Post on 23-Jan-2018

379 views

Category:

Software


0 download

TRANSCRIPT

Page 1: When do software issues get reported in large open source software - Rakesh Rana

Dr. Rakesh Rana,

Research Scientist,

Lero – The Irish Software Research Centre, Ireland

When Do Software Issues and Bugs get Reported inLarge Open Source Software Project?

Page 2: When do software issues get reported in large open source software - Rakesh Rana

Objectives/ Research Questions

We examine the reporting pattern of more than 7000 issue reports from five

large open source software projects to evaluate two main characteristics:

(1) When do defects get reported - does there exist any distinct patterns? and

(2) Is there any difference between reported defect inflow and actual defect

inflow for these projects?

Why bother?

• Detailed knowledge of specific patterns in defect reporting can be useful

for planning purposes,

• Differences between reported and actual defect inflow can have

implications on accuracy of dynamic SRGM used for defect

prediction/reliability assessment.

Page 3: When do software issues get reported in large open source software - Rakesh Rana

Results

Our results suggest

While there exist distinct variation over when defects are

reported,

The ratio of reported to actual defects remains fairly stable over

period of time.

These results enhance our confidence in applying SRGMs using

reported defect inflow. (Test - logistic growth model – predicted asymptote

deviations on average ~ 4.8% than using actual bugs for making such predictions.)

The reporting patterns can also provide some insights into

possible group of people who contribute to OSS projects.

Page 4: When do software issues get reported in large open source software - Rakesh Rana

Software is Everywhere

Image source: http://itsallaboutembedded.blogspot.com/2013/03/what-makes-embedded-system-called-as.html

Page 5: When do software issues get reported in large open source software - Rakesh Rana

Open Source

Image Source: https://en.wikipedia.org/wiki/Open-source_software; http://www.webzeee.com/page_osp.html

Page 6: When do software issues get reported in large open source software - Rakesh Rana

Software Defect – Definition

Defect: An imperfection or deficiency in a work product where that work

product does not meet its requirements or specifications and needs to be

either repaired or replaced.

Error: A human action that produces an incorrect result.

Failure: (A) Termination of the ability of a product to perform a required

function or its inability to perform within previously specified limits Or

(B) An event in which a system or system component does not perform a

required function within specified limits.

Fault: A manifestation of an error in software.

Problem: (A) Difficulty or uncertainty experienced by one or more

persons, resulting from an unsatisfactory encounter with a system in use

or (B) a negative situation to overcome.

IEEE standard 1044, Classification for Software Anomalies

Slide | 04

Page 7: When do software issues get reported in large open source software - Rakesh Rana

Defect: An imperfection or deficiency in a work product where that work

product does not meet its requirements or specifications and needs to be

either repaired or replaced.

We make distinction b/w Issues and defects as follows:

Software Issue: is used to refer to a report filed by users or developers

into the given OSS projects’ issue database.

These issues can be Defects/Bugs, request for enhancements,

improvement requests, documentation, refactoring requests, etc.

Defect/Bug: is used interchangeably in this paper referring to issues that

require a corrective maintenance tasks usually achieved by making

semantic changes to the source code.

Slide | 04

Software Defect – Definition

Page 8: When do software issues get reported in large open source software - Rakesh Rana

The Data

Five open source Java projects with active development,

Developed and maintained by APACHE and MOZILLA - tend to follow a

strict bug reporting and fixing process,

K. Herzig, S. Just, and A. Zeller, “It’s not a bug, it’s a feature: how misclassification

impacts bug prediction,” in Proceedings of the 2013, International Conference on

Software Engineering. IEEE Press, 2013, pp. 392–401.

OSS Project Time Period Maintainer No of Issues

HttpClient 11/2001 – 04/2012 Apache 746

Jackrabbit 09/2004 – 04/2012 Apache 2402

Lucene-Java 03/2004 – 03/2012 Apache 2443

Rhino 09/1999 – 02/2012 Mozilla 584

Tomcat5 05/2002 – 12/2011 Apache 1226

Page 9: When do software issues get reported in large open source software - Rakesh Rana

The Data

Five open source Java projects with active development,

Developed and maintained by APACHE and MOZILLA - tend to follow a

strict bug reporting and fixing process,

Steps:

We mined all software issues for these five projects over the given time period

We then mapped each issue to the manual classification of K. Herzig, S. Just, and A. Zeller, “It’s not a bug, it’s

a feature: how misclassification impacts bug prediction,” in Proceedings of the ICSE 2013

Then we used the time stamp information to analyze the trend of total reported issues, bugs and actual bugs.

OSS Project Time Period Maintainer No of Issues

HttpClient 11/2001 – 04/2012 Apache 746

Jackrabbit 09/2004 – 04/2012 Apache 2402

Lucene-Java 03/2004 – 03/2012 Apache 2443

Rhino 09/1999 – 02/2012 Mozilla 584

Tomcat5 05/2002 – 12/2011 Apache 1226

Page 10: When do software issues get reported in large open source software - Rakesh Rana

The Data

HttpClient & Rhino ~ Linear commulative issues profile

Jackrabbit ~ S-shaped

Lucene-Java ~ Convex shaped

Tomcat5 ~ Concave shaped issues inflow profile

These are total issues reported – what about actual bugs?

Some studies suggest ~40% of issues reported as bugs are not real bugs!

0

10

20

30

40

50

60

70

80

90

1 8

15

22

29

36

43

50

57

64

71

78

85

92

99

10

6

11

3

12

0

12

7

13

4

Nu

mb

er o

f Is

sues

Time in Weeks

Issues inflow per week

HttpClientJackrabbitLucene-JavaRhinoTomcat5

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

1 8

15

22

29

36

43

50

57

64

71

78

85

92

99

10

6

11

3

12

0

12

7

13

4

Tota

l Nu

mb

er o

f Is

sues

(n

orm

aliz

ed)

Time in Weeks

Cummulative Issues inflow

HttpClientJackrabbitLucene-JavaRhinoTomcat5

Page 11: When do software issues get reported in large open source software - Rakesh Rana

0

100

200

300

400

500

600

700

800

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Issues Reported per Month

Total Issues Reported Bugs Actual Bugs

0%

10%

20%

30%

40%

50%

60%

70%

80%

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Ratio of Bugs and Issues Reported per Month

Reported Bugs/Total Issues Actual Bugs/Total Issues Actual Bugs/Reported Bugs

Results: Issues reported on monthly basis

Dec: holiday season – lower than average issues & bugs reported,

Jan – Mar: increasig trend - higher than average,

Apr – July: much lower than average issues & bugs reported

Busy (exam) periods at Universities,

end of financial year/quarter over many countries

Although the ratio/proportions remains fairly stable over time (with some exceptions)

Page 12: When do software issues get reported in large open source software - Rakesh Rana

0

20

40

60

80

100

120

140

160

180

200

1 5 9 13 17 21 25 29 33 37 41 45 49 53

Issues Reported per Week

Actual Bugs Total Issues Reported Bugs

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

1 5 9 13 17 21 25 29 33 37 41 45 49 53

Ratio of Bugs and Issues Reported per Week

Reported Bugs/Total Issues Actual Bugs/Total Issues

Actual Bugs/Reported Bugs

Results: Issues reported on weekly basis

Page 13: When do software issues get reported in large open source software - Rakesh Rana

0

200

400

600

800

1000

1200

1400

1600

Mon Tue Wed Thr Fri Sat Sun

Issues Reported by Week Day

Total Issues Reported Bugs Actual Bugs

0%

10%

20%

30%

40%

50%

60%

70%

80%

Mon Tue Wed Thr Fri Sat Sun

Ratio of Bugs and Issues Reported by Week Day

Reported Bugs/Total Issues Actual Bugs/Total IssuesActual Bugs/Reported Bugs

Results: Issues reported by week day

The first day of week (Mon), most contributor are busy at their primary task (education/job) – less contributions to the OSS project,

(Tue-Fri), the number of total issues and bugs reported increases,

On Saturday, while there is drop in absolute number of issues compared to peak of Thr/Friday, but still a large number of issues and bugs are registered,

Another interesting observation: the proportion of actual bugs to reported bugs or total reported issues is higher (maximum) on Saturdays while minimum for the first working day of the week,

And on Sunday, it seems most contributors take time off before the next week starts

Page 14: When do software issues get reported in large open source software - Rakesh Rana

Impact on SRGM prediction models

SRGMs: Software Reliability Growth Models

Can be used for

• Making asymptote predictions, and

• Predicting the shape of defect inflow

0

20

40

60

80

100

120

140

160

180

200

1 5 9 13 17 21 25 29 33 37 41 45 49 53

Issues Reported per Week

Actual Bugs Total Issues Reported Bugs

Page 15: When do software issues get reported in large open source software - Rakesh Rana

Impact on SRGM prediction models

SRGMs: Software Reliability Growth Models

Tested with:

o Logistic model

o 90/10% split

o Actual Bug Inflow (requires manual classification) Vs.

o Reported Bug Inflow (data readily available)

OSS Project

Asymptote (a) Growth rate (b) Constant term (c)

Reported

bugs (Adj)

Actual

bugs

Relative

Error

Reported

bugs

Actual

bugs

Reported

bugs

Actual

bugs

HttpClient 261 260 0.4% 0.051 0.050 39.7 41.3

Jackrabbit 901 923 -2.4% 0.069 0.065 42.3 44.6

Lucene-Java 741 685 8.2% 0.049 0.054 67.6 65.7

Rhino 338 301 12.3% 0.035 0.040 76.4 64.5

Tomcat5 644 640 0.6% 0.082 0.084 36.7 33.3

Page 16: When do software issues get reported in large open source software - Rakesh Rana

Results

Our results suggest

While there exist distinct variation over when defects are

reported,

The ratio of reported to actual defects remains fairly stable over

period of time.

These results enhance our confidence in applying SRGMs using

reported defect inflow. (Test - logistic growth model – predicted asymptote

deviations on average ~ 4.8% than using actual bugs for making such predictions.)

The reporting patterns can also provide some insights into

possible group of people who contribute to OSS projects.

Page 17: When do software issues get reported in large open source software - Rakesh Rana

Next Steps

Analysis of what local time, commits and issue reports are made can also help

us build better profile of who actually contributes to OSS project and when

these contribution occur

How much more can we learn about the patterns within reported issues and

have better understanding of OSS contributors by using defect classification

techniques (for e.g. using Orthogonal defect classification to the issues from

OSS bug repositories)

How does patterns of issues and bug reporting differ for cases where OSS

projects are managed by government or commercial organizations, etc.

Page 18: When do software issues get reported in large open source software - Rakesh Rana

For more detailsContact: Rakesh Rana

[email protected]