memories of bug fixes
TRANSCRIPT
Memories of Bug-Fixes
Sunghun Kim, Kai Pan, Jim Whitehead{hunkim, pankai, ejw}@cs.ucsc.edu
University of California, Santa Cruz
What is a bug (Zeller 2006)?
• This pointer, being null, is a bug► An incorrect program state
• This software crashes; this is a bug► An incorrect program execution
• This line 11 is buggy►An incorrect program code
Bugs?
• //null dereference• public nullDeref () {
MyObject o = null; if (isGoodDay) {
o = new MyObject(“Hi”);
}
System.out.println(o.toString()); }
Bugs?
• //null dereference• public nullDeref () {
MyObject o = null; if (isGoodDay) {
o = new MyObject(“Hi”);
}
System.out.println(o.toString()); }
Bugs?
//stack buffer overun for sizes greater than 14 stack_buffer(void* src, int size ) { char buffer[14]; memcpy(buffer, src, size ); }
Bugs?
//stack buffer over-run for sizes greater than 14 stack_buffer(void* src, int size ) { char buffer[14]; memcpy(buffer, src, size ); }
Bugs?
if (…) {
setSelectedText("\t");
}
• There are many bug fix patterns that are specific to an individual project, and may not match one of the static patterns
• Example from jEdit project:
JEditTextArea.java at transaction 114- setSelectedText("\t"); + insertTab();
JEditTextArea.java at transaction 86 -setSelectedText("\t");+ insertTab();
Project-Specific Bug Fix Patterns
Bug?
if (requiredProjectRsc.exists() &&
requiredProjectRsc.isOpen()) {
…
}
• Example from Eclipse project:
JavaProject.java, transaction 2024 (“Fix for bug 28434”)- if (requiredProjectRsc.exists() &&- requiredProjectRsc.isOpen()) {
+ if (JavaProject.hasJavaNature(requiredProjectRsc)) {
DeltaProcessor.java, transaction 1945 (“Fix for bug 27499”)- boolean isOpened=proj.isOpen();- if (isOpened && this.hasJavaNature(proj))
+ if (JavaProject.hasJavaNature(proj))
Project-Specific Bug Fix Patterns
Horizontal and Vertical Bug Patterns
Buffer over run
Horizontal: general bugs
Vertical: project specific
Null dereference
JEditexample
Eclipseexample
Bug-Fix Memories – Basic Idea
Extract patterns in bug fix change history
……
Bug fix changes in revision 1 .. n-1
Memory
Bug-Fix Memories – Basic Idea
Extract patterns in bug fix change history
……
Search for patterns against Memory
Bug fix changes in revision 1 .. n-1
Memory
Code to examine
Talk Overview
• Detection of bug fix changes• Mining vertical bugs
► Abstracting code
• Evaluation • Conclusions• Future Work
Retrieving Bug Fix Changes
• Software projects today record their development history using Software Configuration Management tools
• As developers make changes, they record a reason along with the change
► In the change log message• When developers fix a bug in the software, they tend to
record log messages with some variation of the words “fixed” or “bug”
► “Fixed null pointer bug”• It is possible to mine the change history of a software
project to uncover these bug-fix changes• That is, we retrospectively recover those changes that
developers have marked as containing a bug fix► We assume they are not lying
Hunks, and Hunk PairsRevision n-1(has bug hunks)
Revision n(has fix hunks)
modification
addition
deletion
added hunk
hunk pair type
deleted hunk
empty deleted hunk
empty added hunk
Detecting Vertical Bugs (Patterns)
• Detecting bug patterns► Saving exact code in bug and fix hunks doesn’t
work, since there is rarely an exact match.► Need a method for abstracting changes to find
patterns
• Approach► Abstract code in each bug fix change► Save abstracted bug and fix code in a database (the
“bug fix memory”)► Can search existing code to see if it matches a bug
fix pattern► Can suggest code to fix the bug
Process for Abstracting Code
• Four step process► Raw component extraction
• Parse source code, and burst out individual syntactic elements
► Normalization• Substitute type names for variables, string literals,
constants (abstract to types)► Information filtering
• Remove elements that are too common to yield project-specific patterns
► Diff filtering• Remove code components that are common in bug and fix
hunks, yielding only code unique to the change
Raw Component Extraction
• Step 1: Convert statements inside change hunks so they lie on a single line
► Eliminate whitespace► Concatenate multi-line statements to one line► Concatenate conditionals for complex statements (if, while,
etc.) to one line
• Step 2: Extract raw components► Component is a non-leaf node in the syntax tree of a single line► Bursts out complex statements into constituent parts
• Each portion of a complex conditional is a separate component► Additionally, separate out a method call and its parameters
Raw Component Extraction Example
• Initial code
if (foo.flag > 5 && foo.ready()) {
i=1;
foo.create(“example”);
initiate(6,bar);
}
• Extracted Raw Componentsfoo.flag
foo.flag > 5
foo.ready()
ready()
foo.flag > 5 && foo.ready ()
if (foo.flag > 5 && foo.ready())
i=1
“example”
foo.create(.) “example”
create(.) “example”
initiate(,) 6, bar
if
>
&&.
.
foo flag
5 foo ready()
ready
Normalization
• To further improve the ability to match code, perform abstraction of instances to types
► Replace variable instance with its type• Permits matching on type, rather than instance• foo.flag >= 5 Foo.flag >= 5 (type of foo is Foo)
► For literals, insert new component with type• i=1 yields int=1 and int=int
► For method calls, replace each parameter with type of parameter
• Use “*” for unknown types (we only do one-pass parse)• initiate(,) 6, bar initiate(,) int,* (type of bar is unknown)
Information Filtering Goal
• After normalization, resulting components are candidates for insertion into database
► Problem: many commonly occurring statement types• int=int
► Want to eliminate these, and others that don’t contribute unique information about bug fixes
Diff Filtering and Storing Memories
• As a final filtering step, keep only those components that are unique to either bug or fix hunks
► Duplicate components are eliminated, since they do not represent the bug or its fix
• After diff filtering step, store all components into the database (“memory”)
► Components record their transaction, file name, bug or fix hunk, etc.
► Also store initial source code of bug and fix hunks
Searching the Memory
• The memory database contains extracted adaptive bug and fix patterns for a given project
• Can use this memory to find code that matches bug code in the memory
• Use scenario► Developer working in their favorite development
environment► Receives feedback when code they are developing
matches a stored bug pattern► Can also suggest potential fixes from stored bug fix
code
IDE IntegrationBug
detection
Fix suggestion
Evaluation
• We evaluated the memory to determine how well it captures new bug fix changes
► Online learning approach► Specifically, we create a memory for transactions 1 to n-1► At transaction n, for bug fix changes we examine whether the
bug hunks are found in the memory• This is a “half hit”
► If found, we also examine whether the fix hunk is found too• This is a “full hit”
► Examined same 5 project histories• ArgoUML, Columba, Eclipse, jEdit, Scarab
• This can be viewed as a proxy for how well the approach might work for bug and fix prediction
Half and Full Hit
Build memories based on transaction 1 .. n-1
……
Transaction 1 .. n-1
MemoriesBug | Fix
Fix change caseat transaction n
Half hit Full hit
True and False Positives
Build memories based on transaction 1 .. n-1
……
False positive half hit, if found
True positive half hit, if found
Transaction 1 .. n-1
Memories
Non-fix change case at transaction n
Fix change caseat transaction n
True Positive Hit Rates
True Positive Hit Rate
0
5
10
15
20
25
30
35
40
45
ArgoUML Columba Eclipse jEdit Scarab
Projects
Hit
Rate
Full hit
Half hit
False Positive Hit Rates
False Positive Hit Rate
0
5
10
15
20
25
30
35
ArgoUML Columba Eclipse jEdit Scarab
Projects
Hit
Rate
Full hit
Half hit
True Positive and False Positive Full Hit Rates
0
2
4
6
8
10
12
14
16
18
ArgoUML Columba Eclipse jEdit Scarab
Projects
Hit
Rate
TP full hit
FP full hit
True Positive and False Positive Full Hit Rates
• Bug fix memories work well► Captures 19.3%-40.3% of bugs (half-hits)► But, also captures a lot of non-bug changes (20.8%-
32.5%)
PMD VS Fix Memories
• PMD is a bug finding tool based on a static syntax checker
Bug
PMD VS Fix Memories
• PMD is a bug finding tool based on a static syntax checker
Bug
PMD
PMD VS Fix Memories
• PMD is a bug finding tool based on a static syntax checker
Bug
PMD
Fix Memories
PMD VS Fix Memories
• PMD is a bug finding tool based on a static syntax checker
Bug
PMD
Fix Memories
40.3%6.5%
PMD VS Fix Memories
• PMD is a bug finding tool based on a static syntax checker
• Found bugs by PMD and Fix memories are largely exclusive
PMD
Fix Memories
3%
ArgoUML
38.7%6.5%
PMD
Fix Memories
2.3%
Eclipse
Conclusions
• It is now possible to reliably extract bug fix memories from software project evolution data
• Bug fix memories work well► Captures 19.3%-40.3% of bugs (half-hits)► But, also captures a lot of non-bug changes (20.8%-
32.5%)
• Found bugs using fix memories and PMD are mostly exclusive
► Our approach complements other bug finding tools
Future Work
• Developing other pattern extracting algorithms► To remove false positives► AST, Slicing, Control flow, etc.
• Comparing fix memories with more bug finding tools
► FindBugs, JLint, etc.