understanding the triaging and fixing processes of...

Understanding the Triaging and Fixing Processes of Long Lived Bugs

Ripon K. Saha, Sarfraz Khurshid, Dewayne E. Perry

Center for Advanced Research in Software Engineering (ARiSE)Department of Electrical and Computer Engineering

The University of Texas at Austin, USA

Abstract

Context: Bug fixing is an integral part of software development and maintenance. A large number of bugs often indi-cate poor software quality, since buggy behavior not only causes failures that may be costly but also has a detrimentaleffect on the user’s overall experience with the software product. The impact of long lived bugs can be even morecritical since experiencing the same bug version after version can be particularly frustrating for user. While there aremany studies that investigate factors affecting bug fixing time for entire bug repositories, to the best of our knowledge,none of these studies investigates the extent and reasons of long lived bugs.Objective: In this paper, we investigate the triaging and fixing processes of long lived bugs so that we can identify thereasons for delay and improve the overall bug fixing process.Methodology: We mine the bug repositories of popular open source projects, and analyze long lived bugs from fivedifferent perspectives: their proportion, severity, assignment, reasons, as well as the nature of fixes.Results: Our study on seven open-source projects shows that there are a considerable number of long lived bugs ineach system and over 90% of them adversely affect the user’s experience. The reasons for these long lived bugs arediverse including long assignment time, not understanding their importance in advance etc. However, many bug-fixeswere delayed without any specific reasons. Furthermore, 40% of long lived bugs need small fixes.Conclusion: Our overall results suggest that many long lived bugs can be fixed quickly through careful triaging andprioritization, if developers could predict their severity, change effort, and change impact in advance. We believe ourresults will help both developers and researchers better to understand factors behind delays, improve the overall bugfixing process, and investigate analytical approaches for prioritizing bugs based on bug severity as well as expectedbug fixing effort.

Keywords: Bug tracking, bug triaging, bug survival time

1. Introduction

Software development and maintenance is a complexprocess. Although developers and testers try their bestto make their software error free, in practice softwareships with bugs. The number of bugs in software is asignificant indicator of software quality since bugs canadversely affect users experience directly. Therefore,developers are generally very active in finding and re-moving bugs.

To ensure high software quality for each release, de-velopers/managers triage bugs carefully and schedule

Email addresses: [email protected] (Ripon K. Saha),[email protected] (Sarfraz Khurshid),[email protected] (Dewayne E. Perry)

the bug fixing tasks based on their severity and prior-ity. Despite such a rigorous process, there are still manybugs that live for a long time. We believe the impactof these long lived bugs (for our study, bugs that arenot fixed within one year after they are reported) is evenmore critical since the users may experience the samefailures version after version. Therefore, it is importantto understand the extent and reasons of these long livedbugs so that we can improve software quality.

A number of previous studies have investigated theoverall factors affecting bug fix time. Giger et al. [8]empirically investigated the relationships between bugreport attributes and the time to fix. Zhang et al. [31]predicted overall bug fix time in commercial projects.Canfora et al. [6] used survival analysis to determine therelationship between the risk of not fixing a bug within a

Preprint submitted to Elsevier November 12, 2014

given time frame and specific code constructs changedwhen fixing the bug. Zhang et al. [30] examined fac-tors affecting bug fixing time along three dimensions:bug reports, source code involved in the fix, and codechanges that are required to fix the bug.

While these studies are useful in understanding theoverall factors related to bug fix time, we know of nostudy that has specifically investigated long lived bugsto understand why they take such a long time to be fixedand how important they are. We point out that analyz-ing entire bug datasets using various machine learningor data mining techniques (as done in previous work) isnot sufficient in understanding long lived bugs due to theimbalanced dataset.1 Imbalanced datasets are a majorproblem in most data mining applications since machinelearning algorithms can be biased towards the majorityclass due to over-prevalence [12]. We expect (and ourresults also support) that the proportion of long-livedbugs would be lot less than 50% of the total bugs, thusresulting an imbalanced dataset. Therefore, if we au-tomatically analyze all the bug reports using a standarddata mining technique, it is highly likely that the mainfactors behind long lived bugs would get lost. In this pa-per, we conduct an exploratory study focused solely onlong lived bugs to understand their extent and reasonswith respect to following research questions:

1. What proportion of the bugs are long lived? Theanswer to this question is important since if thereare few long lived bugs, there may be little reasonto worry.

2. How important long lived bugs are in terms ofseverity? It is important to understand how cru-cial these bugs were from the perspective of bothdevelopers and users. If they are minor or trivialbugs, their impact would be less on overall soft-ware quality.

3. Where is most of the time spent in the bug fix-ing process? The answer to this question is im-portant to identify the time consuming phases sothat developers as well as researchers can work onimproving the process involving that phase.

4. What are common reasons for long lived bugs?To improve the bug fixing process, first we need tounderstand the underlying reasons for delays. De-lineating the common reasons of long lived bugswill help researchers deal with the problem moresystematically.

1A dataset is imbalanced if the classification classes are not ap-proximately equally represented.

5. What is the nature of long lived bug fixes? Theanswer to this question will help us in better under-standing the bug fixing process, estimating changeefforts, and so on, which will be useful in explor-ing potential approaches for improving overall bugfixing process.

We study seven open source projects: JDT, CDT,PDE, and Platform from the Eclipse product family,written in Java, 2 and the Linux Kernel, WineHQ, andGDB, written in C. Our key observations are summa-rized below:

1. Despite advances in software development andmaintenance processes, there are a significantnumber of bugs in each project that survive formore than one year.

2. More than 90% of long lived bugs affect users’ nor-mal working experiences and thus are important tofix. Moreover, there are duplicate bug reports forthese long lived bugs, which indicate the users’ de-mand for fixing them.

3. The average bug assignment time of these bugswas more than one year despite the availability ofa number of automatic bug assignment tools thatcould have been used. The bug fix time after theassignment was another year on average.

4. Reasons for long lived bugs are diverse. Whileproblem complexity, reproducibility, and not un-derstanding the importance of some of the bugsin advance are the common reasons, we observedthere are many bug-fixes that got delayed withoutany specific reason.

5. Unlike previous studies [30], we found that a bugsurviving for a year or more does not necessarilymean that it requires a large fix. We found that40% of long-lived bug fixes involved few changesin only one file.

This paper extends our previous “long lived bugs”paper presented at CSMR-WCRE 2014 [22] in threedirections. First, we perform the same set of experi-ments on three additional popular projects: the LinuxKernel, WineHQ, and GDB, which are not only writtenin different programming language but also from dif-ferent domains than our previous subject systems. Sec-ond, we provide more details of our previous results.Finally, our new results show that our previous findingshold even for software projects from different domains

2http://www.eclipse.org

2

and written in different languages. Thus this paper re-ports more generalizable results. We believe these find-ings will play an important role in developing new ap-proaches for bug triaging as well as improving the over-all bug fixing process.

2. Background

In this section, we provide necessary background forour study that includes a brief description of bug track-ing systems and a typical bug life cycle.

2.1. Bug Tracking System:Generally project stakeholders maintain a bug

database for tracking all the bugs associated with theirprojects. There are several online bug tracking systemssuch as Bugzilla, JIRA, Mantis, etc. These systems en-able developers/managers to manage bug database fortheir projects. Different repositories may have differ-ent data structures and follow different life cycles ofbugs. The dataset we created and used in our work wasextracted from Bugzilla, a popular online bug trackingsystem. Therefore, the rest of the discussion in this pa-per regarding the bug tracking system is only limited toBugzilla.

Any person having legitimate access to a project’sbug database can post a change request throughBugzilla. A change request could be either a bug oran enhancement. In Bugzilla, however, both bugs andenhancements are represented similarly and referred asbugs with an exception that for enhancements severityfield is set to enhancement. Generally bug reportersprovide a bug summary, bug description, the suspectedproduct, and the component name with its severity.

Developers in a particular project can define theirown severity level. According to Eclipse Bugzilla docu-mentation, the severity level can be one of the followingvalues, which actually represents the degree of potentialharm.3

Blocker: These bugs block the developmentand/or testing work. There exists no workaround.Critical: These bugs cause program crashes,

loss of data, or severe memory leaks.Major: These bugs result major loss of function.Normal: These are regular issues. There are some

loss of functionality under specific circumstances.Minor: These bugs cause minor loss of functional-

ity, or other problems where an easy workaround waspresent.

3http://wiki.eclipse.org/Eclipse/Bug Tracking

New

Unconfirmed

Assigned

Resolved

Reopen

Closed

Verified

New bug from a user/developer

Bug confirmed

Ownership changed

Bug

is re

open

ed, w

as n

ever

con

firm

ed

Figure 1: Life Cycle of a Bug in Bugzilla

Trivial: These are generally cosmetic problemssuch as misspelled words or misaligned text.

The developers in WineHQ also follow the sameseverity levels. However, the GDB community recog-nizes three levels of severity: critical, normal, and mi-nor. On the other hand, the Linux community has theirown severity level: blocking, high, normal, and low.

In addition to providing severity level, reporters alsospecify the software version, the platform and operatingsystem where they encountered the bug so that develop-ers can easily reproduce it. Bug reporters also can attachfiles to the bug report such as screen shots, failing testcases etc. Once a bug is posted, all other related devel-opers can make comments regarding the bug to discussdifferent issues. Therefore, a bug repository has rich setof information that can be analyzed to gain insight aboutbugs.

2.2. Bug Life Cycle

The overall bug fixing process in a system is directlyrelated to the bug life cycle maintained by the bug track-ing system. Although different projects may have differ-ent schemes for using Bugzilla, a common life cycle fora bug is as follows: 4

Validation: At the start of each day, eachproject/component team leader triages NEW bugs to ver-ify if the bug is really a bug and if the provided infor-mation is correct. In case of any inconsistencies, thebug triager can correct them. The bug triager also canrequest further information to validate a bug if it is nec-essary. If there is no response within a week, the teamleader closes the bug marking RESOLVED, INVALID,

4http://wiki.eclipse.org/Development Resources/HOWTO/Bugzilla Use

3

or WONTFIX. However, the reporter can reopen the buganytime if she has more information.

Prioritization: In this stage, the triager first deter-mines whether a bug is a feature request. If so, theseverity of the bug is changed to enhancement. Oth-erwise, she checks the severity level of the bug to makesure that it is consistent with the bug description. Thenthe priority of the bug is set based on following guide-lines: 5

P1: These bugs are a must fix for the indicated targetmilestone.P2: These bugs are very important for the indicated

target milestone. Generally developers try to resolve allthe P2 bugs.P3: It is the default priority. If the bug triager is

uncertain about the priority of a bug or it is actually anormal bug, she can set P3 priority. Then the assigneddeveloper can adjust it if appropriate.P4: These bugs should be fixed if time permits.P5: These are valid bugs, but there are no plans to

fix. Also P5 priority indicates that help is wanted.Fixing: At this point, a bug remains in the compo-

nent’s “inbox” account until a developer takes the bug,or the team leader assigns it to them. After fixing thebug, the developer mark it as RESOLVED-FIXED.

Verification: Once a bug is fixed, it is assigned toanother committer on the team to verify. Ideally, allbugs should be verified before the next integration build.Once the verifier tests that the bug is completely re-solved, she changes the bug status to VERIFIED. Fig-ure 1 represents all possible state transitions of a bug inBugzilla.

3. Study Setup

This section provides a brief description of the sub-ject systems that we studied, and the metrics and pro-cess we used to understand the extent and reason of longlived bugs.

3.1. Subject Systems

We use seven open source projects for our study.Among them, we choose four projects from the Eclipseproduct family, namely, JDT, CDT, PDE, and Platform,which are written in Java programming language. Theother three projects are the Linux Kernel, WineHQ,and GDB, which are written in C programming lan-guage. There are mainly three reasons for choosing

5http://wiki.eclipse.org/WTP/Conventions of bug priority and severity

these projects. First, These projects are highly success-ful and have been widely used in software engineer-ing research. Second, each project has a long devel-opment history. Third, these projects are from differentdomains.

• JDT and CDT provide a fully functional IntegratedDevelopment Environment based on the Eclipseplatform for developing Java, and C and C++ ap-plications. 6,7

• The Plug-in Development Environment (PDE)provides tools to create, develop, test, debug, buildand deploy Eclipse plug-ins, fragments, features,update sites and RCP products. 8

• The Eclipse Platform defines the set of frameworksand common services that collectively make up in-frastructure required to support the use of Eclipse.9

• Linux Kernel: The kernel of the Linux operatingsystem.10

• WineHQ: A compatibility layer, making it possibleto run Windows applications on POSIX compliantoperating systems.11

• GDB: A debugger for programs written in C, C++,and many other programming languages.12

We have created the dataset for C projects. Thisdataset includes all the bug reports and their historiesfrom their inception to May 2014. For Java projects,we have used Lamkanfi et al’s [16] bug dataset to ex-tract the bug information. The Java dataset includes allthe bug reports and their histories from their inceptionto March 2011 for these four projects (extracted fromEclipse Bugzilla database). More detailed descriptionsof the dataset is presented in Table 1. In the Table, thelast column represents the number of bugs (excludingenhancement and duplicated bugs) that got eventuallyfixed, which is the actual dataset of this study.

3.2. Terms and MetricsWe make use of bug tracking and version control sys-

tem’s information to calculate metrics that we were in-terested in. This section defines different terms and met-rics that we use in the rest of the paper.

6http://projects.eclipse.org/projects/eclipse.jdt7http://www.eclipse.org/cdt/8http://www.eclipse.org/pde/9https://projects.eclipse.org/projects/eclipse.platform

10https://www.kernel.org/11http://www.winehq.org12http://www.sourceware.org/gdb/

4

Table 1: Data SetSystem #CR #Bugs # Enh. # Bug FixedJDT 46,308 38,520 7,788 18,873CDT 14,871 12,854 20,17 7,260PDE 13,677 11,958 1,719 6,854Platform 90,691 78,120 12,571 33,738Linux 23,618 23,387 231 5,784WineHQ 36,691 34,490 2,201 14,338GDB 17,038 15,562 1,476 7,667

Bug Introduction Time (TI ): This is the timestampwhen the buggy code is committed for the first time fora given bug.

Bug Reporting Time (TR): This is the timestampwhen a bug is reported to the Bugzilla system by auser/developer.

Bug Assignment Time (TA): This is the timestampwhen a bug was officially assigned to the right devel-opers through Bugzilla. If a bug is assigned to multipledevelopers, we use the assignment time of the developerwho fixed the bug. If a bug is fixed by multiple devel-opers, we use the assignment time of the developer whocommitted the last changes.

Bug Severity Realization Time (TS): This is thetimestamp when the actual severity of a given bug wasunderstood by the developers and thus the severity fieldof that bug was changed for the last time.

Bug Fix Time (TF ): This is the timestamp when adeveloper officially marked a bug as FIXED in Bugzillathrough the resolution field.

Bug Assignment Period (AP ): This is the lapsetime between when the bug was opened and when it wasassigned to the right developer. Mathematically, AP =TA − TR

Bug Fixing Period (FP ): This is the period of timethat developers took to fix a bug. It should be noted thatit is not the actual coding time of the bug-fix. Instead, itis the time period between the bug assignment time andthe bug fix time. Mathematically, FP = TF − TA. Itshould be further noted that we do not deduct the time ifa bug is temporarily closed. A bug is temporarily closedwhen the developers think that the bug is fixed but ac-tually it is not. Therefore, we do not deduct that time,since the bug is still (at least partially) present duringthe time when the bug reports was closed.

Pre-Severity Realization Period (Pre-SRP ): Thisis the period of time developers took to understand theactual severity of the bug. Therefore, pre-severity real-ization time is the time between bug reporting time and

TI TR TS TA TF TV

Pre-SRP

AP

Post-SRP

FP VP

Figure 2: An Example Timeline of a Bug

the time when the severity was changed for the last time.Mathematically, Pre-SRP = TS − TR.

Post-Severity Realization Period (Post-SRP ):This is the time developers took to fix the bug after re-alizing the actual severity time. Mathematically, Post-SRP = TF − TS .

Bug Verification Period (VP): This is the periodof time that a developer took to verify a bug after it ismarked as FIXED in Bugzilla. Mathematically, V P =TV − TF .

Bug Survival Period (SP): This is the period that abug exists in the system. Ideally it should be the timeperiod between the bug introduction time (TI ) and bugfixing time (TF ). However, in our study it is the timeperiod between TR and TF . The timestamps of bug in-troduction (TI ) and bug reporting (TR) can be certainlydifferent, since a bug can remain dormant for a longtime [7]. Although there are some algorithms [14] toidentify bug introducing changes, it is difficult to mapthose changes to the associated bug reports. Therefore,we preferred bug reporting time over bug introductiontime. Furthermore, since TR is always later than TI

(i.e., a bug is always reported after the bug introduc-ing changes are committed), our calculated bug survivalperiod (SP ) never overestimates actual SP . However,we do not subtract the time period from SP when a bugwas temporarily closed. Figure 2 visually presents allthe terms and metrics in a timeline.

3.3. Identification of Faulty Source Code

Previous studies [13] showed that when developersfix bugs they often put the bug id in their commit mes-sage. Therefore, to get the version histories and commitmessages of these four projects, first we accessed theirgit repositories. Then using JGit APIs, we extracted allthe commit messages from the histories and searched allnumbers.13,14,15,16 Then we matched each number withthe bug IDs. To further ensure that those are indeed

13http://www.eclipse.org/jgit/14git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git15git://source.winehq.org/git/wine.git/16git://sourceware.org/git/binutils-gdb.git

5

Table 2: Bug Fix TimeTime JDT CDT PDE Platform Linux WineHQ GDB

#Bugs [%] #Bugs [%] #Bugs [%] #Bugs [%] #Bugs [%] #Bugs [%] #Bugs [%]<1 day 5,009 26.54 1,911 26.32 2,154 31.43 8,244 24.44 578 9.99 673 4.69 1,591 20.751-7 days 4,788 25.37 1,485 20.45 1,570 22.91 7,420 21.99 881 15.23 1,809 12.62 1,470 19.178-30 days 3,704 19.63 1,230 16.94 1293 18.86 6,442 19.09 1,221 21.11 1,656 11.55 1,358 17.711-6 mon. 3,604 19.10 1,426 19.64 1,212 17.68 7,101 21.05 1,573 27.2 2,837 19.79 1,654 21.576-12 mon. 855 4.53 567 7.81 353 5.15 2,173 6.44 578 9.99 2,062 14.38 511 6.66>1 year 913 4.84 641 8.83 272 3.97 2,358 6.99 953 16.48 5,301 36.97 1,083 14.13Total 18,873 7,260 6,854 33,738 5,784 14,338 7,667

commit 768b107e4b3be0acf6f58e914afe4f337c00932bDate: Fri May 4 11:29:56 2012 +0200

drm/i915: disable sdvo hotplug on i945g/gm

v2: While at it, remove the bogus hotplug_active read,and do not mask hotplug_active[0] before checkingwhether the irq is needed, per discussion with Danielon IRC.Bugzilla:https://bugzilla.kernel.org/show_bug.cgi?id=38442

Figure 3: A bug fixing commit for #38442 in Linux Kernel

bug IDs, we only accepted those commits that containadditional information. For example, in Java projects,the term bug(s) (case insensitive) was present. In theLinux Kernel, we found that developers referred to theBugzilla URL (as shown in Figure 3), whereas in GDBthe bug was referred by the term PR. In this way, we re-duced the chance of getting false positives, although wemight missed some true mappings.

For one of our considered projects, WineHQ, how-ever, the above process gave no results. We thus con-sulted with a developer from the WineHQ communitywho informed us that in this community the conventionis for the bug report to refer back to the commit, ratherthan the commit referring to the bug report. Indeed, inthe WineHQ Bugzilla, there is a dedicated field for a gitcommit id. However, many of these fields were emptysince the field is not required. In this way, we identifiedthe bug fixing commits for those long-lived bugs, wherethe information was available in the commit messages.Then we used git diff to compute following metrics forbug fixes:

Number of Changed Files: It is the number of filesthat went for changes in the bug fixing commit. If a bugwas fixed in multiple commits, it is the total number ofdistinct files in all commits.

Number of Hunks: A hunk is a chunk of adjacentlines that was changed. For a bug fix spanning over mul-

tiple commits, it is total number of hunks in all commits.This is useful to understand how many times developershad to move here and there to fix a bug.

Code Churn: This is the total number of changedlines. Since we use git diff itself, the changes in com-ments were counted as well. For multiple commits, itis the total number of changed lines in all commits. Itshould be noted that if a line is changed, it is consideredas a line deletion first and then addition of another line.Thus the value of code churn for a line change is two.

4. Study Results

In this section, we present the experimental resultswhich answer our research questions.

4.1. RQ1: What proportion of the bugs are long lived?

The first question of any empirical study is how largeis the population that we want to study. The answeris important since if the population is small, there maybe little reason to worry about them. In this cases, ourpopulation of interest is long lived bugs.

The definition of long lived bugs is subjective sincethe time threshold for deciding whether a bug is longlived or short lived could vary across projects, persons,or studies. In this research question, we analyze the sur-vival time of all the fixed bugs in each subject systemand define the long lived bugs more concretely for ourstudy.

Although many of us believe that a bug could be con-sidered as long lived if it survives more than six months,in this study we have considered only those bugs as longlived that survive more than one year. There are twomain reasons behind this decision. First, we wantedto be more conservative so that we can investigate re-ally long lived bugs. Second, the release cycle of thesubject systems that we considered vary from 2 months

6

JDT

CD

T

PDE

Plat

form

Linu

x

Win

eHQ

GD

B

0

1

2

3

4

5

6

Surv

ival

Tim

e (Y

ears

)

Figure 4: Suvival Time of Long Lived Bugs

to one year. More specifically, the Eclipse Founda-tion has coordinated an annual simultaneous release forall projects. GDB release-interval varies from 6 to 10months. Linux Kernel community make most of the sta-ble releases between 3 months and 1 year. Therefore, ifa bug was not fixed in one year, it is expected that thebug propagated through at least two major releases. Andit would not be a pleasant experience for a user if s/heexperiences the same bug in subsequent major versionsof a software.

To investigate long lived bugs, we first investigatehow active developers were in fixing bugs in these sub-ject systems. To this end, we group bugs based ontheir survival period (SP , defined in Section 3.2) andcount the number of bugs in each group as shown inTable 2. Results show that around 50%(+/-4%) of thetotal (fixed) bugs in Java projects were fixed within aweek. In C projects, the bug fixing rate was slower thanthat of Java projects; it took a month to fix around 50%of total bugs (except WineHQ). This indicates that evenin open source project, developers are active in fixingbugs. However, as the results show, 10% to 17% of bugsin Java projects and 20% to 50% of bugs in C projectstook more than six months to be fixed.

Even after considering such a conservative definition,we found more than 4,184 and 7,337 long lived bugs inJava and C projects respectively. It should be noted thatall these bugs eventually got fixed. Therefore, they allare valid bug reports. we believe these are large num-bers and thus it is important to investigate them quanti-tatively and qualitatively.

Figure 4 presents the survival time distribution oflong lived bugs. It shows the average (dot in the box),median (line in the box), upper/lower quartile, and90th/10th percentile of survival time. We limit the val-ues of Y axis to 6 years to better represent the figure.

Table 3: Importance of Long Lived BugsSystem Blkr. Critical Major Normal Minor TrivialJDT 1 5 49 712 119 27CDT 2 4 61 523 37 14PDE 0 6 20 224 13 9Platform 6 42 221 1856 170 63Linux 32 130 N/A 743 48 N/AWineHQ 24 39 185 4200 632 221GDB N/A 53 N/A 977 53 N/A

From the figure, we see that at least 25% (upper quar-tile) of long lived bugs took more than 2.5 years to befixed. For GDB, it is close to 5.5 years.

Among the total number of bugs that developersfixed, 5%-9% in Java projects and 14%-37% in Cprojects took more than one year to be fixed.

4.2. RQ2: How important long lived bugs are in termsof severity?

There are two fields in Bugzilla that indicate the im-portance of a bug: i) severity and ii) priority. How-ever, based on their usage, severity is more importantthan priority to understand the importance, since sever-ity represents the degree of the impact of the bug on theoperation of the system. On the other hand, priority of-ten describes the relative work schedule of fixing a bugset by the developers for a given milestone. For exam-ple, if there are 10 critical bugs in a system but develop-ers have time to fix only five bugs, they can set higherpriority to any five bugs based on some considerationand set a relatively lower priority to others. Sometimes,developers can set high priority to even a less severebug, if it is expected to fix easily than a critical bug.Therefore, for this research question, we emphasize onseverity over priority. Our initial hypothesis was thatmost of the long lived bugs are either minor or trivial,which do not have serious effects.

4.2.1. Bug SeverityTable 3 presents the importance of long lived bugs

based on their severity.Our results show that almost 90% of the long lived

bugs have severity level of normal or above both forJava and C projects. Project-wise the proportion variedfrom 84% to 95%. According to the Eclipse Bugzilladocumentation, only minor and trivial bugs do notinterfere with normal work or use, which means that anybugs having severity level normal and above adversely

7

Table 4: Analysis of Severity for Critical and Major Bugs

System # Bugs Severity Proportion MaximumChanged Changed

JDT 54 23 42.59% 3CDT 65 21 32.31% 4PDE 26 11 42.31% 3Platform 263 115 43.73% 5Linux 152 14 8.64% 1WineHQ 248 68 27.42% 3GDB 53 10 18.89% 1

affect user experiences. Taking that information into ac-count and assuming similar interpretation applies to Cprojects, we believe that the delay in the long bug fixingprocess was not due to the fact that they were trivial.

Now let us a take closer look into more severe bugs:critical and major (blocker bugs generally donot interfere users directly). Our results show that forJava projects, only 1% to 2% of long lived bugs werecritical, whereas 5% to 10% of long lived bugswere major in each system. The absolute numberranged from 4 to 42 for critical and 20 to 221 formajor bugs. For C projects, there are 463 bugs in totalthat are either major or above.

Considering that a critical bug causes programcrashes and/or data loss and a major bug causes ma-jor loss of function, these numbers are high, especiallysince all of them took more than one year to be fixed.

4.2.2. Severity Realization Period (SRP )As a part of this research question, we are also in-

terested in investigating how long it takes to understandthe severity of the bugs. Our initial hypothesis was thatperhaps it took long time to realize the severity of theseimportant (critical or major) bugs. But once theseverity was realized it should not take long time to fixthem since they are important problems to solve.

For this analysis, we have considered only those bugsthat have severity level of major or higher becausethey are the most important ones. Table 4 representsthe number of critical and major long lived bugsin each system, the number of bugs whose severity levelwas changed, and the maximum number of time sever-ity level changed for a bug. Results show that for Javaprojects the severity level of 32%-43% of such bugs wascorrected later. For C projects, the change in severitylevel is only from 8% to 27%. This indicates that thebug reporters could understand the actual severity levelof more than 50% of the bugs for Java projects and 70%-90% of the bugs for C projects at the time of bug post-

Table 5: Time Needed for Understanding Bug SeveritySystem Pre-SRT (Days) Post-SRT (Days)

Avg. Med. Max. Avg. Med. MaxJDT 374 163 1890 338 320 1712CDT 80 7 590 760 700 1442PDE 348 388 1274 300 34 1201Platform 351 164 2208 498 410 2730Linux 150 50 1228 644 572 1204WineHQ 227 55 1756 649 560 2199GDB 1090 756 2375 377 230 1394

ing. Therefore, it is evident that developers took morethan one year to fix a large number of bugs even afterthey realized that the bugs are very important.

Now we analyze the bugs, whose severity has cor-rected later. Table 5 presents the average, median,and maximum Pre-SRT and Post-SRT (defined in Sec-tion 3.2) values. Our results show that it took almost ayear on average to realize the correct severity level ofthe bug in three of the four Java projects. The only ex-ception is CDT, where the average Pre-SRP was 80days. The maximum Pre-SRP of each system showsthat for some bug it took several years to realize theseverity. On the other hand, for these bugs, it took an-other year on average to be fixed. For CDT, which wasthe best in terms of average Pre-SRP , Post-SRP wasmore than two years. From the maximum Post-SRP ,we see that some bugs took even three to eight yearsto be fixed after developers realized the actual severitylevel. Therefore, our results indicate that for most longlived bugs in Java projects, Post-SRP was high regard-less of their Pre-SRP .

We see a varying Pre-SRT in C projects. For Linuxand WineHQ, severity levels were fixed within sixmonths and a year respectively. For GDB, it took morethan three years to understand the actual severity level.However, it should be noted that the severity level ofmost of the important bugs in C project was knownat the time bug posting. However, the Post-SRT wasmore than a year regardless of whether the actual sever-ity level was understood in advance or later. From themaximum Post-SRT, we see that some major or highersevere bugs were fixed after six years.

4.2.3. Duplicate BugsSeverity is certainly the most reliable information to

understand the importance of a bug since it is deter-mined by the bug reporters and supported by develop-ers. However, a large number of duplicate bugs alsomay express their importance since they often indicate

8

Table 6: Duplicate BugsSystem # Bugs # Duplicate # Duplicate Bugs (NOD) Max

Bugs 1 2 3 4 5 >5 NODJDT 913 210 101 42 27 10 13 17 26CDT 641 52 36 8 4 3 0 1 6PDE 272 50 32 8 2 0 2 6 15Platform 2,358 495 271 102 51 23 15 33 20Linux 953 63 46 11 2 2 0 2 10WineHQ 5,301 603 386 91 41 22 21 42 46GDB 1,083 102 87 13 1 0 1 0 5

that the scope of the master bug is large and/or the af-fected users/other developers are getting frustrated [4].Therefore, in addition to the severity level, we also in-vestigated the number of duplicated bugs.

Table 6 presents an overview of duplicated bugs oflong lived bugs. Results show that for 9% to 23% oflong lived bugs in Java projects, users/developers sub-mitted multiple bug reports. For C projects, the pro-portion of duplicate bugs varied between 6% and 11%.From the maximum number of duplicate bugs, we seethat some bugs have more than 20 duplicated bug re-ports in Java projects. In WineHQ, there is a bug (id#6971), for which 46 duplicate bug reports were sub-mitted. The middle columns present more fine grainedresults of duplicated bugs.

More than 90% of long lived bugs affect users’ nor-mal working experiences and thus are important tofix. However, it took a long time to fix these bugseven after realizing their severity. Moreover, thereare multiple bug reports for these long lived bugs,which indicate the users’ demand for fixing them.

4.3. RQ3: Where was most of the time spent in the bugfixing process?

A bug fixing process majorly can be divided intothree phases in terms of activity: i) assignment phaseii) fixing phase, and iii) verification phase. In this re-search question, we analyze the time taken by teamleads/developers in each phase. Our initial hypothesiswas that perhaps it took a long time to assign long livedbugs to the appropriate developers. But once the bugsare assigned, it should not take too long to fix them.

Table 7 presents the average, median, and maximumtime of both assignment period (AP ) and fixing period(FP ) in terms of days for all long lived bugs. Our re-sults show that it took more than 1.5 years on averageto assign the bugs to the appropriate developers in Javaprojects. The median AP also shows that the data is

Table 7: Bug Assignment Time Vs. Bug Fixing TimeSystem Assign. Period (AP) Fixing Period (FP)

Avg. Med. Max. Avg. Med. MaxJDT 463 374 2745 407 376 2854CDT 603 552 2035 330 97 1815PDE 484 482 2728 437 393 1622Platform 459 373 3326 407 409 2854Linux 347 275 1617 413 363 2472WineHQ 489 327 3144 606 478 2563GDB 915 605 4732 269 78 3224

Table 8: Reassignments of Long-lived BugsSystem Reassigned Proportion Max. Reassn.JDT 771 84.45% 14CDT 478 74.57% 8PDE 174 63.97% 10Platform 2066 87.61% 12Linux 437 45.86% 8WineHQ 399 6.33% 5GDB 367 33.89% 6

fairly normally distributed. The maximum AP showsthat it can take more than six years to assign sum bugsto the correct developers. From Table 8, we can alsoobserve that most of the long lived bugs in Java projectsare reassigned at least once. The proportion of bugs re-assigned ranged from 64% to 88%. More than 10% oflong lived bugs were reassigned 5 times or more.

For C projects, the information regarding the first bugassignment was not present for all bugs. We found thebug assignment time only for those bugs that was re-assigned (439, 401, and 366 bugs in Linux, WineHQ,and GDB respectively). Based on the results from thosebugs, we see that the average bug assignment time in Cprojects varies from one year to three years.

Guo et al.[10] have conducted a study to investigatethe reasons for bug reassignment. They observed thatreassignments are not always harmful. In fact many re-assignments happened to find the appropriate develop-

9

ers. However, they also observed that the required timefor bug fixes increased with the increase of number ofreassignments. Therefore, they concluded that exces-sive reassignments are harmful. They delineated fivereasons for bug reassignments: finding the root cause,determining ownership, poor bug report quality, hard todetermine proper fix, and workload balancing. There-fore, taking the aforementioned findings into account,our results indicate that the assignment of these longlived bugs was complex and time consuming, support-ing our initial hypothesis.

However, unlike our expectations, the average FPfor all systems was quite high: around a year. By see-ing the median FP for CDT, we understand the data isskewed. But for the other three subjects, it is not thecase. Also the maximum FP shows that, like the bugassignment, it took more than five years for some bugsto be fixed after they assigned to the right developers.

On the other hand, for the verification period, wefound that most of the bugs were never verified, at leastaccording to Bugzilla data. However, if they do get ver-ified, the verification time is pretty small: less than amonth for most of the subject systems.

Bug assignment and bug fixing are still time inten-sive processes, despite the availability of automaticbug assignment tools that could have been used.

4.4. RQ:4 What are common reasons for long livedbugs?

To answer this question, we first manually analyzedall the critical and major bug reports from JDT.We have intentionally chosen the highly severe bugs,since they should be taken seriously by the develop-ers and thus, we will be able to identify the actual rea-sons of delay. We also analyzed 50 recent (criticalor major) long lived bugs from PDE and Platform.Since JDT and CDT are from similar domain, we didnot take any bugs from CDT. Finally, we manually an-alyze 20 bugs of Linux kernel (10 oldest + 10 recentlong lived bugs with high severity) to check if we findany new category. In this way, we identified a set of125 (= 55+25+25+20) bug reports for manual analysis.

Tagging Methodology: As we discussed in Sec-tion 2, each bug report contains a summary, descrip-tion and a list of developers comments, which often pro-vide rich information about the problems associated thebug. In order to identify the underlying reasons, first, weread the bug summary and description to understand thenature of bugs. Second, we carefully analyze develop-ers’ comments to understand the reasons for any delays

since developers often discuss different problems asso-ciated with a given bug through comments. For most ofthe cases, the actual reasons were easily identifiable.

To categorize the reasons for delays, we followed anopen-ended taxonomy. We incrementally analyzed allthe bug reports. For any given bug report, first we iden-tified the high level reason and checked if the reasonalready fits into any of the existing categories. Other-wise, we create a new category. We have quoted severalkey comment(s) for most of the categories to better un-derstand the tagging procedure. In the few cases wherethe reasons were ambiguous, we relied on contextualinformation. The following summarizes a taxonomy ofcommon reasons for long lived bugs that we found inthe subject systems.

1) Hard to understand: Understanding/locatingbuggy statements/files in a software project is hard.Sometimes, identifying even the buggy component canbe hard. For example, there is a bug ( #128563) in JDT,where developers had hard time in understanding if it isa VM or JDT bug. The following comments explainsthe situation:

“I found something quite interesting. If you move theclasses from the two output folders into the same direc-tory and you run from there, it works fine. We gener-ate exactly the same bytecodes in both cases. The VMshould behave the same. Might be a VM bug.”

After two years, another developer commented–“Ibelieve this is our bug, we should not reference a nonaccessible type in our bytecode. The fact it works attimes feel like unspecified behavior from the VM.”

2) Uncertain how to fix: Sometimes developers mayknow how to solve a bug, but need to wait for mak-ing the solution consistent/robust with other parts of thesoftware. The following comments in bug # 3849 rep-resents such a scenario:

“I would like to defer this until we know how we willimplement the new Code Manipulation Infrastructure.This is only possible if we get a better undo story. Cur-rently we can only push undo commands on the refac-torings undo stack if a file is save. Otherwise the nextsave would flush the current undo stack which wouldremove the undo object for extract method.”

3) Hard to fix: This kind of bug is hard to fix. Therewere lots of group discussions for a long time regard-ing different alternative solutions and finally the groupagreed on some specific solution.

4) Risky to fix: Sometimes, bugs are caught just be-fore the release. Then if developers think that it wouldbe risky to change the relevant code, they generally de-fer it for the next release although the bug is important.Then it takes a long time to fix the bug. The follow-

10

ing developers’ comments on bug #80,000 in JDT rep-resents such a scenario:

“Will investigate during RC2 whether there’s a lowrisk fix for this.”

Two weeks later, the same developer commented:“Sorry, too risky to touch at this point.”

5) Incomplete fix: This is considered as one of thecommon problems for taking a long time to fix bugs.Developers often miss corner cases while bug fixing andneed to re-fix again until the problem is not fully solved.Here is a developer’s comment regarding a bug fix for#38746 in JDT.

“The fix for this problem is not sufficiently robust.Please see Bug 75454 for more information as to howthings can go wrong. Not only does the situation de-scribed there happen once, but it happens 60 times onstart-up ( 10 minutes of start-up time).”

There are also lots of other reasons for incompletebug fixes. For a comprehensive set of reasons for in-complete bug fixes, please refer to [20].

6) Importance was not realized until duplicatebugs were reported: We found many bugs where therewere some activities around the bug for some time,which we observed by reading developers’ comments.After that there was no activity for a long time. Thensomebody pointed out some duplicate bugs and every-body started talking again; the bug was fixed quickly.The following comment on bug # 16114 in PDE repre-sents such an example:

“this one is experienced by several users (see the du-plicates for more info). Looks like something causescertain fragment files on the disk to be in use and whenwe try to delete the project (even with ’force’ option),we fail. This leave us with a partially deleted projectthat causes more trouble after that.”

7) Reproducibility: There are some bugs that takea long time to reproduce, but once it is reproduced, itis fixed quickly. For example, it took 1 year and 4months to reproduce the bug #268833 in Platform buttook only one day to fix. This problem often happensfrom low quality bug report, execution difference due toplatforms, and so on.

There are some interesting bugs, where users knowhow to reproduce the bug but it happens for some spe-cial cases and thus needs some time to reproduce. Forthis kind of bug, if users submit the bug without con-crete data, it takes a long time to reproduce the requireddata that developers need to analyze the bug. Therefore,the bug fix gets delayed although the responsible devel-oper is ready to fix it. For example, to debug an “out ofmemory” problem in JDT (# 54831), developers neededa heap dump, which was not submitted when the bug

was posted. When the assigned developer asked for it,the bug reporter (who is actually another developer ofJDT) was busy with his own work and could not submitthe heap dump on time. As a result, it took a long timeto fix the bug.

8) Schedule issue: Sometimes developers also feelthat a bug is important to fix. However, they havemore important bugs at hand that should be fixed ear-lier. Therefore, although the other bugs are important,they are generally deferred. For example, there is ablocker bug (#10800) in JDT that prevented users toput space in VM arguments. Blocker bugs are con-sidered as the most severe bugs. However, such a severebug was deferred due to scheduling issues. Certainly,other developers were not very happy about that. Thefollowing developers’ comments illustrate the scenariomore clearly.

“Can more explanation be given as to why this issuehas been marked as LATER? Does this mean it will notbe fixed any time soon? If so, I find it very unfortunateas this is a very serious bug and requires nasty workarounds. If not, then my apologies...”

In reply, the responsible team leader said: “In thiscase, ‘LATER’ means probably not for the final 2.0release (tentatively scheduled for sometime in May).Quite simply, this problem was not deemed as criticalas a lot of other problems that need to be solved for 2.0.The debug committers have A LOT to do before 2.0. Butthe beauty of an open source project is that if someonefeels strongly about a particular feature or bug, they canmake a contribution. If you would like to contribute afix, I would be happy to review it.”

Finally, it took more than two years to fix the bug.9) Not aware of fix or reopened due to misunder-

standing: We are not completely certain if these bugsare really long lived. In this category, some bugs per-haps fixed earlier but developers have not changed thestatus in Bugzilla. Therefore, the reporters or other de-velopers were not aware of the fix. Later some otherdevelopers just closed the bugs mentioning that proba-bly the bugs have been already fixed. Another case isthat sometimes reporters misunderstood something andreopened a given bug again. But then some other devel-oper clarified the mistakes the reporter was making andfinally again marked it as FIXED and RESOLVED.

10) Infrequent use case: This kind of bug is impor-tant to fix considering their destruction ability. How-ever, they are not too frequent use cases. Therefore, de-velopers just defer it for next milestone. For example,due to the bug # 130874 in JDT, a user can lose his/herJava code template references. However, developer de-ferred it by making following comment.

11

“We should definitely fix this during 3.5. Too late for3.4 and really not a very common case.”

11) Others: There are also other reasons for delayin bug fixing such as expert developers are on vacation,dependency on other bugs to be fixed, and various doc-ument fixing.

12) As-usual delay: We have not found any specificreasons for these bugs by analyzing developers’ com-ments and thus we considered them as as-usual delay.If there are some specific reasons (mentioned above) tomake delay, it is highly likely that developers will dis-cuss it like the other bugs. However, it is also possiblethat the fixes were deferred due to scheduling issues.The following comment for bug #149316 in JDT can beserved as an example of as-usual delay.

“Thanks for the good examples, sorry for the wait.fixed > 20080422.”

One may think that since these bug-fixes were de-layed without any specific reasons, they are probablynot that important or relevant. However, it shouldbe noted that for this investigation we analyzed onlycritical or major bugs. So they are alreadymarked as important by the reporters or developers. Fur-thermore, since we ourselves are users of these soft-ware systems, we understand that many such bugs canbe frustrating. The following represent two summariesof such bugs:

Bug # 26556 (JDT): “PDE Junit does not read plugininfo from the plugins directory.”

Bug # 43153 (Linux): “Random SATA drives onPMPs on sata sil24 cards not being detected at bootsince 3.2/3.4.”

We encountered an interesting finding while analyz-ing the bug reports manually. We started our manual in-vestigation with JDT and listed all the common reasonsfrom there. We have not found any new common rea-son when analyzing the bug reports for PDE, Platform,or the Linux Kernel. Therefore, we believe that this isa comprehensive list of reasons for long lived bugs. Itshould be noted that these reasons are not mutually ex-clusive.

Reasons for long lived bugs are diverse. While prob-lem complexity, problems in reproducing errors, andnot understanding the importance of some of thebugs in advance are the common reasons, we ob-served there are many bug-fixes that were delayedwithout any specific reason.

Table 9: Reasons of Sampled Long Lived Bugs

Reason # Bugs Bug IDs1) 13 113870 (PDE), 128563 (JDT), 241241

(Platform), 245008 (Platform), 247766(PDE), 268833 (Platform), 278598 (PDE),3022 (Linux), 5534 (Linux), 5637 (Linux),9905 (Linux), 13484 (Linux), 38442(Linux)

2) 3 3849 (JDT), 36204 (JDT), 133072 (PDE)3) 16 3849 (JDT), 24951 (PDE), 36204 (JDT),

38746 (JDT), 40243 (JDT), 46407 (JDT),67425 (JDT), 82850 (JDT), 99137 (JDT),233643 (PDE), 233773 (Platform), 266651(JDT), 273450 (Platform), 295200 (JDT),38442 (Linux), 44161 (Linux),

4) 2 80000 (JDT), 102780 (JDT)5) 6 1766 (JDT), 33035 (JDT), 36204 (JDT),

136135 (PDE), 3410 (Linux), 46171(Linux),

6) 14 36204 (JDT), 46216 (JDT), 50735 (JDT),109636 (JDT), 117698 (JDT), 156168(JDT), 175226 (JDT), 224880 (Platform),243894 (Platform), 257202 (Platform),266651 (JDT), 267649 (Platform), 273450(Platform), 278598 (PDE)

7) 11 1766 (JDT), 39222 (JDT), 54831 (JDT),82850 (JDT), 83473 (JDT), 195183 (JDT),262032 (Platform), 294650 (Platform),298795 (Platform), 2979 (Linux), 31602(Linux),

8) 7 3920 (JDT), 19251 (PDE), 46216 (JDT),67425 (JDT), 224880 (Platform), 235572(Platform), 277638 (Platform)

9) 12 6437 (JDT), 19248 (PDE), 28637 (JDT),44035 (JDT), 61744 (PDE), 132333(PDE), 158589 (PDE), 271373 (Platform),14563 (Linux), 43981 (Linux), 45031(Linux), 46161 (Linux)

10) 2 34033 (PDE), 130874 (JDT)11) 8 12955 (JDT), 16686 (JDT), 20919 (PDE),

24951 (PDE), 29799 (PDE), 34399 (PDE),231936 (PDE), 290324 (PDE)

12) 34 21100 (PDE), 26556 (JDT), 38288 (PDE),39803 (JDT), 51862 (PDE), 89347 (JDT),95288 (JDT), 97541 (JDT), 111419 (JDT),128303 (PDE), 129689 (PDE), 149316(JDT), 154823 (JDT), 175133 (JDT),181954 (JDT), 209537 (JDT), 226595(Platform), 234623 (Platform), 235554(Platform), 236104 (Platform), 237025(PDE), 238943 (JDT), 258952 (Platform),262032 (Platform), 267173 (Platform),275910 (Platform), 277638 (Platform),279781 (Platform), 285101 (Platform),5637 (Linux), 11509 (Linux), 38312(Linux), 41682 (Linux), 43153 (Linux)

12

Table 10: Analysis of Bug FixesSystem # Bugs #Number of Files (NOF) Med Max

1 2 3 4 5 >5 NOF NOFJDT 223 80 41 37 10 15 40 2 47CDT 185 50 32 19 18 8 58 3 237PDE 105 34 13 13 12 4 29 3 211Platform 740 349 114 72 47 36 122 2 91Total 1253 513 200 141 87 63 249 - -(%) - 40.94 15.96 11.25 6.94 5.03 19.87 - -Linux 171 117 28 12 4 3 7 1 19WineHQ 609 231 203 70 37 29 39 2 51GDB 33 0 10 2 1 0 20 7 222Total 813 348 241 84 42 32 66 - -(%) - 42.8 29.64 10.33 5.17 3.94 8.12 - -

4.5. RQ5: What is the nature of bug fixes?

In this research question, we investigate the nature ofbug fixes in terms of source code changes. To do this,first we identify the bug fixing changes in the sourcecode for the long lived bugs. Using the methodologydescribed in Section 3.3, we were able to identify fixedfiles for 280, 264, 154, and 1,109 bugs in JDT, CDT,PDE, and Platform respectively, and 171, 609, and 33bugs in Linux, WineHQ, and GDB respectively. Then,we compute the number of changed files, number ofhunks, and code churns for each bug-fix, as describedin Section 3.2. These metrics are often used to get arough idea about change effort, although understandingthe actual change effort is difficult and depends addi-tionally on the implemented algorithm and code com-plexity itself. Analogous to previous study results [30],our initial hypothesis was that the required source codechanges to fix most of the long lived bugs would belarge.

4.5.1. Changes at File LevelTo understand the nature of bug fixes, we first an-

alyze the source code changes in terms of number ofchanged files. As we stated in our initial hypothesis, weexpected a large number of changed files for most longlived bugs. Surprisingly, from the results in Table 10,we see that for both Java and C projects, more than 40%of fixes involved only one source code file. This propor-tion varied from 27% to 47% among the Java projects,whereas it varied from 38% to 68% in C projects (ex-cept GDB). For only 30% of long lived bugs, the re-quired changes spanned over more than three files inJava projects, whereas it was only 17% for C projects.From our results it is also noticeable that the maximum

number of changed files in each system is quite high.However, when we manually investigated such changes,we found they are often moving files from one directoryto another, or adding a test suite to test a specific or mul-tiple bugs.

4.5.2. Number of HunksNow we analyzed the changes in terms of hunk size to

get more fine grained results. A large number of hunksindicates that developers needed to modify a lot of dif-ferent places to fix the bugs. Figure 5 presents the num-ber of bugs for each hunk size in Java projects. Ourresults show that 43% to 53% of bugs in Java projectswas fixed by changing five hunks or less. In C projects,the proportion was even higher. 76% and 67% of bugfixes in the Linux Kernel and WineHQ respectively in-volved 5 hunks or less. More than 70% of long livedbugs for JDT and Platform was fixed within 10 hunks,whereas the numbers are 16 and 17 for PDE and CDTrespectively. The median number of hunks for all longlived bugs is only 5 or less for all projects except GDB.Considering that, this is an overestimated measurementof source code changes since we have analyzed textualdiff results (so changes in comments have been alsocounted), we believe the number of hunk is low. Itshould be noted that we presented all the bugs that in-volved more than 100 hunks in the graph at the end.Therefore, there is a spike at the end of some graphs.

4.5.3. Code ChurnWe investigate further low level changes (at line

level). Figure 6 presents the distribution of codechanges in terms of code churns (defined in Sec-tion 3.2). From the figure, we see that a consider-

13

0 5

10 15 20 25 30 35

0 20 40 60 80 100

Prop

or%o

n of Bugs

Number of hunks (JDT)

0 5

10 15 20 25 30 35

0 20 40 60 80 100

Prop

or%o

n of Bugs

Number of Hunks (PDE)

0 5

10 15 20 25 30 35

0 20 40 60 80 100

Prop

o%on

of B

ugs

Number of Hunks (CDT)

0 5

10 15 20 25 30 35

0 20 40 60 80 100

Prop

or%o

n of Bugs

Number of Hunks (Pla=orm)

0 5

10 15 20 25 30 35

0 20 40 60 80 100

Prop

or%o

n of Bugs

Number of Hunks (Linux)

0 5

10 15 20 25 30 35

0 20 40 60 80 100

Prop

or%o

n of Bugs

Number of Hunks (WineHQ)

Figure 5: Number of Hunks Vs. Proportion of Bugs

able proportion of bugs required major changes. Morespecifically, for 23% of bugs, the value of code churnwas more than 100. However, there are even a largerproportion of bugs that required changes less that 20lines. For example, for 8-14% of bug fixes in Javaprojects, the value of code churn was from 1 to 5, for7-12% the value of code churn was from 6 to 10, andfor 9-13% the value of code churn was from 11 to 20.For the Linux Kernel and WineHQ, the proportion waseven higher. 40% of long lived bugs in the Linux Kerneland 20% of bugs in WineHQ required changes in only10 or less lines of code. Recalling Section 3.2, it shouldbe noted that the code churn value of one line change is2, whereas an addition or deletion of line is 1. There-fore, a value of code churn value of 10 may be changesin only five lines. We understand that some smaller bugfixes can be complex. But at the same time, we alsostress that many long lived bugs could be fixed quicklythrough careful prioritization.

It should be noted that the overall bug fixing changesin GDB seems to be larger than other projects. How-ever, as Table 10 shows, we were able to identify thebug fixing changes for only 33 bugs in GDB. Therefore,it is very difficult to draw any conclusion from GDB.

4.5.4. A Qualitative AnalysisNow we present three bug fixes to discuss how a sim-

ple fix can take a long time to be fixed. To select theseexample-fixes we considered the following criteria: i)they are from different subject systems, ii) they are im-portant, i.e., critical or major (or high for linux kernel),and iii) their descriptions are concise enough to presentin the paper.

JDT

CDT

PDE

Plat

form

Linu

x

Win

eHQ

GDB

0

100

200

300

400

500

600

Code

Chu

rn (L

OC)

Figure 6: Code Churn of Long Lived Bugs

Bug # 38260 (SWT): This is a bug in the SWT com-ponent of Eclipse Platform. The bug was first reportedas having a normal severity level. However, within 15days, it was reconsidered to be critical. The fol-lowing is the bug description provided by the reporter:“When I use the CCombo with the dialog, the dropdownlist shown in the back of the Dialog.

I did as following.1. Create one sample application with a Group.2. The parent of group is one shell.3. I create a dialog.4. I put my application to this dialog. and change the

Group’s parent as the dialog’s parent.Now the above descripbed pbm occured.”From the bug comments, we found that the bug was

14

--- a/bundles/org.eclipse.swt/Eclipse SWT Custom Widgets/common/org/eclipse/swt/custom/CCombo.java+++ b/bundles/org.eclipse.swt/Eclipse SWT Custom Widgets/common/org/eclipse/swt/custom/CCombo.java@@ -76,7 +76,7 @@ public CCombo (Composite parent, int style) {

if ((style & SWT.READ_ONLY) != 0) textStyle |= SWT.READ_ONLY;if ((style & SWT.FLAT) != 0) textStyle |= SWT.FLAT;text = new Text (this, textStyle);

- popup = new Shell (getShell (), SWT.NO_TRIM);+ popup = new Shell (getDisplay(), SWT.NO_TRIM | SWT.ON_TOP);

int listStyle = SWT.SINGLE | SWT.V_SCROLL;if ((style & SWT.FLAT) != 0) listStyle |= SWT.FLAT;if ((style & SWT.RIGHT_TO_LEFT) != 0) listStyle |= SWT.RIGHT_TO_LEFT;

Figure 7: Bug Fixing Changes for # 38260 in SWT Component of Platform

reproduced within three days and the actual problemwas identified within a month. However, it took morethan ten months to fix the bug. Figure 7 presents thechanged code for fixing this bug and it was only a oneline change.

Bug # 195183 (JDT): This is a major bug inthe Debug component in JDT. The bug summary anddescription (condensed) are as follows: “JavaClass-Path.performApply() uses original instead of workingcopy causes NPE”“Steps To Reproduce: I use JavaClassPath in my customlaunch config and has some code like:” ............[somecode snippet] “and as soon as performApply() is calledisDefaultClasspath() fails since it is passed in a null asa launchconfig even though I passed in a newly createdone. This worked fine in eclipse 3.2 and it seem the cul-prit is that wc.getOriginal() is used instead of just wc.Resulting in a NPE.”

From the bug description, we can see that the bug re-port was very specific. The reporter clearly pointed outthat there are some problems with the getOriginal()API. Interestingly, from the bug-fix (Figure 8), wefound that only one line was changed and the changewas the removal of the getOriginal() API. But by thattime, more than two years had passed.

Bug # 3410 (Linux Kernel): This is a bug withhigh severity in the Linux Kernel. The bug summaryand description provided by the reporter are as follows:

Summary: “passive mode is not left, once entered”Description: “Steps to reproduce:

echo “xx:xx:low value:xx:xx:” >/proc/acpi/thermal zone/∗/trip pointsecho “xx:xx:high value:xx:xx:” >/proc/acpi/thermal zone/∗/trip pointspassive mode still active, even the temperature is farbelow the trip point

With this bug, there was an attempt to fix it on thesame day the bug was reported. However, the fix wasincomplete, which was identified on the next day. Al-though there was some discussion regarding the bug

around that time, it remained unfixed for almost oneand half year. Figure 9 shows the second fix of the bug,which made the first fix complete. The patch involvedchanges in only two lines of code.

We believe that one year or more is too long time tofix the bugs like these examples, especially consideringthat they were considered as very important.

Unlike previous studies, we found that a bug surviv-ing for a year or more does not necessarily mean thatit requires a large fix. We found that 40% of long-lived bug fixes involved few changes in only one file.

5. Developers’ Survey

Since we have not found any specific reasons for asignificant proportion of long lived bugs (classified as“as-usual delay”), and many long lived bugs involvedsmall fixes, we believe that many such delays could havebeen avoided if developers could predict the severityand change effort in advance. To investigate what devel-opers think about our assumption, we conducted a sur-vey. We sent the survey to all the developers who con-tributed to the subject projects from year 2010. In total,we sent the survey to 104 developers. Among them, 38developers do not use their email address anymore.

In the survey, we briefly explained our results andpossible actions (e..g. predicting severity, change ef-fort, prioritization, etc.), and asked developers if theyagree with us or not. We also asked what actions canbe taken to minimize the number of long lived bugs, ifthey think otherwise. We got responses from five devel-opers. Three developers agreed that tool support to pre-dict severity and change effort may be helpful. Further-more, some of them think that understanding the impactof change is even more critical and often a major reasonfor delay in making bug fixing commits. One developerasked us if it is really possible to develop such tools in-stead of answering our survey. Another developer said

15

--- a/org.eclipse.jdt.debug.ui/ui/org/eclipse/jdt/debug/ui/launchConfigurations/JavaClasspathTab.java+++ b/org.eclipse.jdt.debug.ui/ui/org/eclipse/jdt/debug/ui/launchConfigurations/JavaClasspathTab.java@@ -272,7 +272,7 @@ public class JavaClasspathTab extends AbstractJavaClasspathTab {public void performApply(ILaunchConfigurationWorkingCopy configuration) {

if (isDirty()) {IRuntimeClasspathEntry[] classpath = getCurrentClasspath();

- boolean def = isDefaultClasspath(classpath, configuration.getOriginal());+ boolean def = isDefaultClasspath(classpath, configuration);

if (def) {configuration.setAttribute(IJavaLaunchConfigurationConstants.ATTR_DEFAULT_CLASSPATH,

(String)null);configuration.setAttribute(IJavaLaunchConfigurationConstants.ATTR_CLASSPATH,

(String)null);

Figure 8: Bug Fixing Changes for # 195183 in Debug Component of JDT

--- a/drivers/acpi/processor_thermal.c+++ b/drivers/acpi/processor_thermal.c@@ -102,8 +102,8 @@ static int cpu_has_cpufreq(unsigned int cpu)

struct cpufreq_policy policy;if (!acpi_thermal_cpufreq_is_init || cpufreq_get_policy(&policy, cpu))

- return -ENODEV;- return 0;+ return 0;+ return 1;

Figure 9: Bug Fixing Changes for # 3410 in Linux Kernel

that duplicate bug report detection or improving bug re-port quality can be more useful.

6. Threats to Validity

This section discusses the validity and generalizabil-ity of our findings. In particular, we discuss ConstructValidity, Internal Validity, and External Validity.

Construct Validity: We used two artifacts: bug re-ports from the bug tracking system and source codechanges from the version history, which are generallywell understood. We have used also well known met-rics in our data analysis such as various time periods,the number of changed files, the number of hunks, codechurns, which are straightforward to compute. Boththe used dataset and version histories are also publiclyavailable, which enable the replication of this study.Therefore, we argue for a strong construct validity.

Internal Validity: In our study, we relied on the in-formation from the bug tracking system and version his-tories. However, the information in these systems maynot be completely accurate. For example, a change re-quest can be actually an enhancement but it could bemisclassified as a bug [11]; the severity level associatedwith some bugs may not reflect the actual severity lev-els. Furthermore, a developer may commit a bug fixingchange a long time after she actually fixed the bug. Sim-ilarly, a tester may change the bug status from FIXEDto VERIFIED a long time after she actually verified thebug. Although it is very difficult to completely eliminatethese threats, we performed extensive manual investiga-

tions and qualitative analyses, and provided many con-crete examples throughout the paper to minimize thesethreats.

To delineate the common reasons of long lived bugs,we manually analyzed bug reports. There might havebeen some unintentional misinterpretations during themanual verification due to the lack of domain knowl-edge or the lack of useful contextual knowledge. How-ever, we held extensive discussions to minimize thisthreat.

We used traditional heuristics to find mappings be-tween bug fixing changes and associated bug reports.Although, in this way, we missed many bug fixingchanges, the precision of our result is very high, whichis important for our study. ReLink [29] is a more ad-vanced algorithm that improves the recall quite a bit forfinding the mappings. However, it also sacrifices someprecision. In future, we would like to use ReLink to seehow it affects our results.

The phenomena studied had yearly major releases.Systems with more frequent release cycles may well ex-hibit different phenomena, although there will still belong lived bugs. The number of release cycles and thelapse times for long lived bugs in this context are likelyto be different.

External Validity: We have used seven subject sys-tems in our experiment and all of them are open sourceprojects. Although, they are very popular projects, ourfindings may not be generalizable to other open sourceprojects or industrial projects. However, the Java datasetthat we used has more than 165,000 bug reports, and the

16

C dataset we created contains more than 77,000 bug re-ports, which is large. Furthermore, the consistent find-ings from both Java and C projects make our resultsmore generalizable for large open source projects. How-ever, additional confidence could be achieved by addingmore subject systems (both open source and industrial).

7. Related Work

The study of software bugs/faults has been an ac-tive research area for nearly two decades. Perry andStieg [21] were among the first to analyze softwarefaults in a large evolving software system. Since then,researchers analyzed various software artifacts relevantto bugs (e.g. bug report, bug fixing changes) to under-stand and to improve different steps (e.g. bug reporting,triaging, localizing, fixing) of the bug fixing processes.

Thung et al. [26] investigated when a bug should bereported. Bettenburg et al. [3] studied the qualities of agood bug report. In another study, Bettenburg et al. [4]investigated the extents and reasons of duplicated bugreports. They presented empirical evidences that du-plicate bug reports are not necessarily bad. They oftenprovide additional information, which is important forautomatic bug triaging, bug assignment, and localiza-tion. Guo et al. [9] characterized and predicted whichbugs get fixed. They noted that in addition to the impor-tance of bugs, there are several other factors that affectwhether a bug would get fixede.g., the reputation of bugreporters, influence of seniority, personal relations andtrust, etc. In another study [10], the same authors inves-tigated the reasons for bug reassignment (described inmore detail in Section 4.3). Lamkanfi et al. [15] andTian et al. [27] predicted the severity and priority ofbugs respectively. Anvik et al. [2] and Shokripour etal. [25] proposed approaches for automatic bug assign-ment. Saha et al. [23] and Zhou et al. [32] proposeddifferent approaches for automatic bug localization. Tocomplement these studies, in this paper, we focused onlong lived bugs to understand their characteristics andreasons.

The work closest to ours is the study of bug-fix timeprediction, since these studies also identify the factorsthat are correlated to bug fixing time. Weiss et al. [28]considered the text (summary and description) in thebug report as the prime factor and used that to pre-dict bug fix time. Panjer [19] observed that comment-ing activity, bug severity, product, component, and ver-sion are the most influential factors in predicting bugfix time. Giger et al. [8] found that the assigned de-veloper, the bug reporter, and the month when the bugwas reported have the strongest influence on the bug

fixing time. Zhang et al. [31] also found the sameresults for commercial projects. Anbalagan et al. [1]found a strong relationship between bug fixing time andthe number of people participating in the bug report.Marks et al. [17] observed different results for differentprojects. They found that bug fixing time is importantfor Mozilla project, whereas, bug severity is the key forEclipse.

While the aforementioned studies vary in terms ofanalysis and techniques used, some of the common ap-proaches used in these studies are that researchers usedvarious machine learning or data mining techniques toanalyze the whole bug dataset in identifying the over-all factors affecting bug fixing time. Bhattacharya andNeamtiu [5] pointed out that most attributes used byprior work do not correlate with bug-fix time when an-alyzed in isolation, and thus they emphasized on find-ing new attributes that correlate with bug-fix time inisolation. We stress that it is also important to ana-lyze various kinds of bug-fixes in isolation to gain bet-ter insight about specific group of bugs. For example,Shihab et al. [24] studied and predicted reopened bugs,Park et al. [20] investigated supplementary bug-fixes,and Ngyuen et al. [18] analyzed recurring bug-fixes. Inthis study, we analyzed long lived bugs to advance em-pirical knowledge further regarding long-term delays inthe bug fixing process.

There is another group of studies that investigatedthe actual source code changes for bug fixes to studybug fixing time. Canfora et al. [6] found relationshipsbetween different program constructs and bug survivaltime. For example, exception handling leads to low bugsurvival time. Zhang et al. [30] found that bug fixingtime increases with the increase of code churns. In ourstudy, we have also analyzed the source code changesfor long lived bug-fix and showed that many long livedbugs involved only few changes in one file.

8. Conclusion

Bug fixing is a fundamental and critical activity inthe software development and maintenance phases sincebuggy behavior may cause not only costly failures butalso can affect user’s overall experiences with the soft-ware product. In this paper, we showed that although thesoftware development and maintenance processes haveadvanced a lot, there are still a significant number ofbugs in each project that survive for more than a year.More than 90% of these long lived bugs may have af-fected users’ normal working experience. The averagebug assignment time was more than one year and thebug fix time after the assignment was another year on

17

average. When we analyzed the bug descriptions andthe developers’ comments around these bugs, we foundthat the reasons for long lived bugs are diverse. Whileproblem complexity, problems in reproducing, and notunderstanding the importance of some of the long livedbugs in advance are the common reasons, we observedthere are many bugs that were delayed without any spe-cific reasons. Finally, by investigating the actual sourcecode changes for these long lived bugs, we noted thata bug surviving for a year or more does not necessarilymean that it requires a large fix. In fact, we found 40%of long-lived bug fixes that involved few changes in onlyone file. Most importantly, all of the findings are con-sistent across the projects that we considered regardlessof domains or programming languages.

In summary, our results indicate that the overall bugfixing time of many, if not all, long lived bugs can be re-duced through careful prioritization, and by predictingtheir severity, change effort, and change impact. Ourfindings also indicate that although there are a numberof tools for supporting bug triaging and fixing (e.g. au-tomatic bug assignment, bug fix time prediction), we ap-pear to realize very few benefits from them. There maybe two possible reasons: i) developers are not aware thatthese tools exist, or ii) the tools do not meet developersneeds or expectations. In the future, we plan to conducta developers survey to understand the reasons for thisphenomenon. We believe all of these findings togetherwill play an important role in developing new and moreeffective approaches for bug triaging as well as improv-ing the overall bug fixing process.

Acknowledgement: We thank Ahmed Lamkanfi atthe University of Antwerp for extending their bug Javadataset significantly for our study, and Julia Lawall atInria/LIP6/UPMC/Sorbonne University, France, for herhelp creating C dataset. This research was supported inpart by NSF Grants CCF-0820251 and CCF-0845628.

[1] P. Anbalagan and M. Vouk. On predicting the time taken tocorrect bug reports in open source projects. In Proceeding ofthe International Conference on Software Maintenance, pages523–526, 2009.

[2] J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this bug?In Proceeding of the International Conference on Software En-gineering, pages 361–370. ACM, 2006.

[3] N. Bettenburg, S. Just, A. Schroter, C. Weiss, R. Premraj, andT. Zimmermann. What makes a good bug report? In Proceedingof the International Symposium on the Foundations of SoftwareEngineering, pages 308–318. ACM, 2008.

[4] N. Bettenburg, R. Premraj, T. Zimmermann, and S. Kim. Dupli-cate bug reports considered harmful...really? In Proceeding ofthe International Conference on Software Maintenance, pages337–345, 2008.

[5] P. Bhattacharya and I. Neamtiu. Bug-fix time prediction models:can we do better? In Proceeding of the Working Conference onMining Software Repositories, pages 207–210. ACM, 2011.

[6] G. Canfora, M. Ceccarelli, L. Cerulo, and M. Di Penta. Howlong does a bug survive? an empirical study. In Proceeding ofthe International Conference on Software Maintenance, pages191–200. IEEE Computer Society, 2011.

[7] T.-H. Chen, M. Nagappan, E. Shihab, and A. E. Hassan. An em-pirical study of dormant bugs. In Proceedings of the 11th Work-ing Conference on Mining Software Repositories, pages 82–91.ACM, 2014.

[8] E. Giger, M. Pinzger, and H. Gall. Predicting the fix time ofbugs. In Proceeding of the International Workshop on Rec-ommendation Systems for Software Engineering, pages 52–56.ACM, 2010.

[9] P. J. Guo, T. Zimmermann, N. Nagappan, and B. Murphy. Char-acterizing and predicting which bugs get fixed: an empiricalstudy of microsoft windows. In Software Engineering, 2010ACM/IEEE 32nd International Conference on, volume 1, pages495–504. IEEE, 2010.

[10] P. J. Guo, T. Zimmermann, N. Nagappan, and B. Murphy. Notmy bug! and other reasons for software bug report reassign-ments. In Proceedings of the ACM 2011 conference on Com-puter supported cooperative work, pages 395–404. ACM, 2011.

[11] K. Herzig, S. Just, and A. Zeller. It’s not a bug, it’s a feature:How misclassification impacts bug prediction. In Proceedingsof the 2013 International Conference on Software Engineering,pages 392–401. IEEE Press, 2013.

[12] N. Japkowicz. Learning from imbalanced data sets: A compar-ison of various strategies. In Proceeding of the AAAi Workshopon Learning from Imbalanced Data Sets, pages 10–15. AAAi,2000.

[13] S. Kim, E. J. Whitehead, Jr., and Y. Zhang. Classifying soft-ware changes: Clean or buggy? IEEE Transaction on SoftwareEngineering, 34(2):181–196, Mar. 2008.

[14] S. Kim, T. Zimmermann, K. Pan, and E. J. Whitehead. Au-tomatic identification of bug-introducing changes. In Proceed-ing of the Automated Software Engineering, pages 81–90. IEEE,2006.

[15] A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals. Predictingthe severity of a reported bug. In Proceeding of the WorkingConference on Mining Software Repositories, pages 1–10, 2010.

[16] A. Lamkanfi, J. Perez, and S. Demeyer. The eclipse and mozilladefect tracking dataset: a genuine dataset for mining bug infor-mation. In Proceeding of the Working Conference on MiningSoftware Repositories, pages 203–206. IEEE Press, 2013.

[17] L. Marks, Y. Zou, and A. E. Hassan. Studying the fix-timefor bugs in large open source projects. In Proceeding of thePromise, pages 11:1–11:8. ACM, 2011.

[18] T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. Al-Kofahi, andT. N. Nguyen. Recurring bug fixes in object-oriented programs.In Proceeding of the International Conference on Software En-gineering, pages 315–324, 2010.

[19] L. D. Panjer. Predicting eclipse bug lifetimes. In Proceeding ofthe Working Conference on Mining Software Repositories, pages29–32. IEEE Computer Society, 2007.

[20] J. Park, M. Kim, B. Ray, and D.-H. Bae. An empirical study ofsupplementary bug fixes. In Proceeding of the Working Confer-ence on Mining Software Repositories, pages 40–49, 2012.

[21] D. E. Perry and C. S. Stieg. Software faults in evolving a large,real-time system: A case study. In Proceeding of the ESEC,pages 48–67, 1993.

[22] R. K. Saha, S. Khurshid, and D. E. Perry. An empirical study oflong lived bugs. In Proceeding of the IEEE CSMR-18/WCRE-21Software Evolution Week, pages 144–153. IEEE, 2014.

[23] R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry. Improv-ing bug localization using structured information retrieval. InProceeding of the Automated Software Engineering, pages 345–

18

355, 2013.[24] E. Shihab, A. Ihara, Y. Kamei, W. Ibrahim, M. Ohira, B. Adams,

A. Hassan, and K.-i. Matsumoto. Studying re-opened bugsin open source software. Empirical Software Engineering,18(5):1005–1042, 2013.

[25] R. Shokripour, J. Anvik, Z. M. Kasirun, and S. Zamani. Why socomplicated? simple term filtering and weighting for location-based bug report assignment recommendation. In Proceeding ofthe Working Conference on Mining Software Repositories, pages2–11, 2013.

[26] F. Thung, D. Lo, L. Jiang, Lucia, F. Rahman, and P. Devanbu.When would this bug get reported? In Proceeding of the Inter-national Conference on Software Maintenance, pages 420–429,2012.

[27] Y. Tian, D. Lo, and C. Sun. Drone: Predicting priority of re-ported bugs by multi-factor analysis. In Software Maintenance(ICSM), 2013 29th IEEE International Conference on, pages200–209. IEEE, 2013.

[28] C. Weiss, R. Premraj, T. Zimmermann, and A. Zeller. How longwill it take to fix this bug? In Proceeding of the Working Con-

ference on Mining Software Repositories, pages 1–8, 2007.[29] R. Wu, H. Zhang, S. Kim, and S.-C. Cheung. Relink: recovering

links between bugs and changes. In Proceedings of the 19thACM SIGSOFT symposium and the 13th European conferenceon Foundations of software engineering, pages 15–25. ACM,2011.

[30] F. Zhang, F. Khomh, Y. Zou, and A. E. Hassan. An empiricalstudy on factors impacting bug fixing time. In Proceeding ofthe International Conference on Software Maintenance, pages225–234. IEEE Computer Society, 2012.

[31] H. Zhang, L. Gong, and S. Versteeg. Predicting bug-fixing time:an empirical study of commercial software projects. In Proceed-ing of the International Conference on Software Engineering,pages 1042–1051. IEEE Press, 2013.

[32] J. Zhou, H. Zhang, and D. Lo. Where should the bugs befixed? - more accurate information retrieval-based bug local-ization based on bug reports. In Proceeding of the InternationalConference on Software Engineering, pages 14–24, 2012.

19

understanding the triaging and fixing processes of...

Documents