when do software issues get reported in large open source software - rakesh rana
TRANSCRIPT
Dr. Rakesh Rana,
Research Scientist,
Lero – The Irish Software Research Centre, Ireland
When Do Software Issues and Bugs get Reported inLarge Open Source Software Project?
Objectives/ Research Questions
We examine the reporting pattern of more than 7000 issue reports from five
large open source software projects to evaluate two main characteristics:
(1) When do defects get reported - does there exist any distinct patterns? and
(2) Is there any difference between reported defect inflow and actual defect
inflow for these projects?
Why bother?
• Detailed knowledge of specific patterns in defect reporting can be useful
for planning purposes,
• Differences between reported and actual defect inflow can have
implications on accuracy of dynamic SRGM used for defect
prediction/reliability assessment.
Results
Our results suggest
While there exist distinct variation over when defects are
reported,
The ratio of reported to actual defects remains fairly stable over
period of time.
These results enhance our confidence in applying SRGMs using
reported defect inflow. (Test - logistic growth model – predicted asymptote
deviations on average ~ 4.8% than using actual bugs for making such predictions.)
The reporting patterns can also provide some insights into
possible group of people who contribute to OSS projects.
Software is Everywhere
Image source: http://itsallaboutembedded.blogspot.com/2013/03/what-makes-embedded-system-called-as.html
Open Source
Image Source: https://en.wikipedia.org/wiki/Open-source_software; http://www.webzeee.com/page_osp.html
Software Defect – Definition
Defect: An imperfection or deficiency in a work product where that work
product does not meet its requirements or specifications and needs to be
either repaired or replaced.
Error: A human action that produces an incorrect result.
Failure: (A) Termination of the ability of a product to perform a required
function or its inability to perform within previously specified limits Or
(B) An event in which a system or system component does not perform a
required function within specified limits.
Fault: A manifestation of an error in software.
Problem: (A) Difficulty or uncertainty experienced by one or more
persons, resulting from an unsatisfactory encounter with a system in use
or (B) a negative situation to overcome.
IEEE standard 1044, Classification for Software Anomalies
Slide | 04
Defect: An imperfection or deficiency in a work product where that work
product does not meet its requirements or specifications and needs to be
either repaired or replaced.
We make distinction b/w Issues and defects as follows:
Software Issue: is used to refer to a report filed by users or developers
into the given OSS projects’ issue database.
These issues can be Defects/Bugs, request for enhancements,
improvement requests, documentation, refactoring requests, etc.
Defect/Bug: is used interchangeably in this paper referring to issues that
require a corrective maintenance tasks usually achieved by making
semantic changes to the source code.
Slide | 04
Software Defect – Definition
The Data
Five open source Java projects with active development,
Developed and maintained by APACHE and MOZILLA - tend to follow a
strict bug reporting and fixing process,
K. Herzig, S. Just, and A. Zeller, “It’s not a bug, it’s a feature: how misclassification
impacts bug prediction,” in Proceedings of the 2013, International Conference on
Software Engineering. IEEE Press, 2013, pp. 392–401.
OSS Project Time Period Maintainer No of Issues
HttpClient 11/2001 – 04/2012 Apache 746
Jackrabbit 09/2004 – 04/2012 Apache 2402
Lucene-Java 03/2004 – 03/2012 Apache 2443
Rhino 09/1999 – 02/2012 Mozilla 584
Tomcat5 05/2002 – 12/2011 Apache 1226
The Data
Five open source Java projects with active development,
Developed and maintained by APACHE and MOZILLA - tend to follow a
strict bug reporting and fixing process,
Steps:
We mined all software issues for these five projects over the given time period
We then mapped each issue to the manual classification of K. Herzig, S. Just, and A. Zeller, “It’s not a bug, it’s
a feature: how misclassification impacts bug prediction,” in Proceedings of the ICSE 2013
Then we used the time stamp information to analyze the trend of total reported issues, bugs and actual bugs.
OSS Project Time Period Maintainer No of Issues
HttpClient 11/2001 – 04/2012 Apache 746
Jackrabbit 09/2004 – 04/2012 Apache 2402
Lucene-Java 03/2004 – 03/2012 Apache 2443
Rhino 09/1999 – 02/2012 Mozilla 584
Tomcat5 05/2002 – 12/2011 Apache 1226
The Data
HttpClient & Rhino ~ Linear commulative issues profile
Jackrabbit ~ S-shaped
Lucene-Java ~ Convex shaped
Tomcat5 ~ Concave shaped issues inflow profile
These are total issues reported – what about actual bugs?
Some studies suggest ~40% of issues reported as bugs are not real bugs!
0
10
20
30
40
50
60
70
80
90
1 8
15
22
29
36
43
50
57
64
71
78
85
92
99
10
6
11
3
12
0
12
7
13
4
Nu
mb
er o
f Is
sues
Time in Weeks
Issues inflow per week
HttpClientJackrabbitLucene-JavaRhinoTomcat5
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
1 8
15
22
29
36
43
50
57
64
71
78
85
92
99
10
6
11
3
12
0
12
7
13
4
Tota
l Nu
mb
er o
f Is
sues
(n
orm
aliz
ed)
Time in Weeks
Cummulative Issues inflow
HttpClientJackrabbitLucene-JavaRhinoTomcat5
0
100
200
300
400
500
600
700
800
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Issues Reported per Month
Total Issues Reported Bugs Actual Bugs
0%
10%
20%
30%
40%
50%
60%
70%
80%
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Ratio of Bugs and Issues Reported per Month
Reported Bugs/Total Issues Actual Bugs/Total Issues Actual Bugs/Reported Bugs
Results: Issues reported on monthly basis
Dec: holiday season – lower than average issues & bugs reported,
Jan – Mar: increasig trend - higher than average,
Apr – July: much lower than average issues & bugs reported
Busy (exam) periods at Universities,
end of financial year/quarter over many countries
Although the ratio/proportions remains fairly stable over time (with some exceptions)
0
20
40
60
80
100
120
140
160
180
200
1 5 9 13 17 21 25 29 33 37 41 45 49 53
Issues Reported per Week
Actual Bugs Total Issues Reported Bugs
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
1 5 9 13 17 21 25 29 33 37 41 45 49 53
Ratio of Bugs and Issues Reported per Week
Reported Bugs/Total Issues Actual Bugs/Total Issues
Actual Bugs/Reported Bugs
Results: Issues reported on weekly basis
0
200
400
600
800
1000
1200
1400
1600
Mon Tue Wed Thr Fri Sat Sun
Issues Reported by Week Day
Total Issues Reported Bugs Actual Bugs
0%
10%
20%
30%
40%
50%
60%
70%
80%
Mon Tue Wed Thr Fri Sat Sun
Ratio of Bugs and Issues Reported by Week Day
Reported Bugs/Total Issues Actual Bugs/Total IssuesActual Bugs/Reported Bugs
Results: Issues reported by week day
The first day of week (Mon), most contributor are busy at their primary task (education/job) – less contributions to the OSS project,
(Tue-Fri), the number of total issues and bugs reported increases,
On Saturday, while there is drop in absolute number of issues compared to peak of Thr/Friday, but still a large number of issues and bugs are registered,
Another interesting observation: the proportion of actual bugs to reported bugs or total reported issues is higher (maximum) on Saturdays while minimum for the first working day of the week,
And on Sunday, it seems most contributors take time off before the next week starts
Impact on SRGM prediction models
SRGMs: Software Reliability Growth Models
Can be used for
• Making asymptote predictions, and
• Predicting the shape of defect inflow
0
20
40
60
80
100
120
140
160
180
200
1 5 9 13 17 21 25 29 33 37 41 45 49 53
Issues Reported per Week
Actual Bugs Total Issues Reported Bugs
Impact on SRGM prediction models
SRGMs: Software Reliability Growth Models
Tested with:
o Logistic model
o 90/10% split
o Actual Bug Inflow (requires manual classification) Vs.
o Reported Bug Inflow (data readily available)
OSS Project
Asymptote (a) Growth rate (b) Constant term (c)
Reported
bugs (Adj)
Actual
bugs
Relative
Error
Reported
bugs
Actual
bugs
Reported
bugs
Actual
bugs
HttpClient 261 260 0.4% 0.051 0.050 39.7 41.3
Jackrabbit 901 923 -2.4% 0.069 0.065 42.3 44.6
Lucene-Java 741 685 8.2% 0.049 0.054 67.6 65.7
Rhino 338 301 12.3% 0.035 0.040 76.4 64.5
Tomcat5 644 640 0.6% 0.082 0.084 36.7 33.3
Results
Our results suggest
While there exist distinct variation over when defects are
reported,
The ratio of reported to actual defects remains fairly stable over
period of time.
These results enhance our confidence in applying SRGMs using
reported defect inflow. (Test - logistic growth model – predicted asymptote
deviations on average ~ 4.8% than using actual bugs for making such predictions.)
The reporting patterns can also provide some insights into
possible group of people who contribute to OSS projects.
Next Steps
Analysis of what local time, commits and issue reports are made can also help
us build better profile of who actually contributes to OSS project and when
these contribution occur
How much more can we learn about the patterns within reported issues and
have better understanding of OSS contributors by using defect classification
techniques (for e.g. using Orthogonal defect classification to the issues from
OSS bug repositories)
How does patterns of issues and bug reporting differ for cases where OSS
projects are managed by government or commercial organizations, etc.