bug prediction based on fine-grained module histories
DESCRIPTION
A first study of fine-grained (method-level) bug prediction with well-know historical metrics.TRANSCRIPT
![Page 1: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/1.jpg)
Bug Prediction Based on Fine-Grained Module
HistoriesH i d e a k i H a t a
O s a m u M i z u n oT o h r u K i k u n o
1
![Page 2: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/2.jpg)
Overview
Background
Historical metrics are useful for bug prediction
Problem
For method-level prediction, it is difficult to collect historical metrics
Solution & Results
Historage: fine-grained version control system
First study of method-level bug prediction with well-known historical metrics
2
![Page 3: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/3.jpg)
Bug Prediction Papers
3
0
5
10
15
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Papers (TSE, EMSE, ICSE, ESEC/FSE, FSE, ICSM, MSR)
![Page 4: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/4.jpg)
Historical Metrics
4
Code
•Code churn •Changes•Past bugs•Process complexity
•Developers•Org structure•Network•Ownership
•Locations•Distribution
Code
Organization
Process
Geography
Bug Prediction Survey: http://bpsurvey-hidehata.dotcloud.com/
![Page 5: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/5.jpg)
Mining Version Control Repository
5
n+3n+2n+1n-3 n-2 n-1 n
Code delta
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 27 28
29 30 31 1 2 3 4
5 6 7 8 9 10 11
26
Su Mo Tu We Th Fr Sa
July 2007 ><
Fix bug #32528
Commit message
... ...
![Page 6: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/6.jpg)
What We Have Learned
6
Prediction accuracy
Historical metrics ≥ Static code metrics[Moser et al. ’08, Kamei et al. ’10]
Required effort
File-level ≤ Package-level[Kamei et al. ’10, Nguyen et al. ’10, Posnett et al. ’11]
![Page 7: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/7.jpg)
State of the Art
7
Package-level
File-level
Method-level
0 5 10 15
Papers (TSE, EMSE, ICSE, ESEC/FSE, FSE, ICSM, MSR)
Cache model[Kim et al. ’07]
Spam filtering model[Mizuno et al. ’07]
No method-level prediction with well-known historical metrics
![Page 8: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/8.jpg)
Method-Level Prediction
8
Requirement
Method-level historical metrics
Problem
Analysis of method histories is difficult
![Page 9: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/9.jpg)
Difficulties
9
1.Tracking methods is troublesome
Matching methods should be found between sequential snapshots
2.Method-level metadata are not easily available
Metadata (who, when,how, etc.) are associatedwith files
n-2 n-1 n
![Page 10: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/10.jpg)
Historage
10
Fine-grained version control system[1]
is created on top on a Git repository
stores methods as files
detects rename/move with Git mechanism
[1] Hata et al., “Historage: Fine-Grained Version Control System for Java,” IWPSE-EVOL ’11.Tool: git2historage(https://github.com/hdrky/git2historage)
com1 com2com1 com2
MethodMethod
MethodMethod
MethodMethod
MethodMethod
MethodMethod
MethodMethod
![Page 11: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/11.jpg)
11
Visualization of repository history
•tree: directory•white node: method
Git - file histories Historage - method histories
![Page 12: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/12.jpg)
Mining Historage
12
n-3Method
n+3Method
n-2Method
n+2Method
n-1
Method
n+1
Method
n
Method
Code delta
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 27 28
29 30 31 1 2 3 4
5 6 7 8 9 10 11
26
Su Mo Tu We Th Fr Sa
July 2007 ><
Fix bug #32528
Commit message
... ...
![Page 13: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/13.jpg)
Study
13
Comparison
Prediction level: package, file, and method
Same metrics and a same prediction algorithm (random forest)
Buggy modules: identified with SZZ algorithm[2]
Evaluation
10-fold cross validation
Effort-based evaluation[2] Sliwerski et al., “When do changes induce fixes?” MSR ’05.
![Page 14: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/14.jpg)
Target
14
Project Period # of commits
Xpand 2y6m 1,038
WTP Incubator 2y8m 1,133
Ant 11y7m 2,590
Lucene/Solr 1y6m 3,485
OpenJPA 5y4m 4,180
Cassandra 2y6m 4423
ECF 6y6m 9,748
Wicket 7y 15,033
![Page 15: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/15.jpg)
Collected Metrics
15
DevTotal/Major/Minor # of Total/Major /Minor developersOwnership Highest proportion of ownership
LOC Lines of codeAdd/DelLOC Added / Deleted LOC
Chg/FixChgNum # of changes/bug-fix changesPastBugNum # of fixed bug IDsPeriod Existing daysBugIntroNum # of bug introducing changesLogCoupNum # of logical coupling changesAvg/Max/MinInterval Avg/Max/Min change intervalHCM Process complexity metric
![Page 16: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/16.jpg)
Effort-Based Evaluation
16
0
25
50
75
100
0 20 40 60 80 100Percent of LOC
Per
cent
of B
ugs
foun
d
sample curve
![Page 17: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/17.jpg)
Result (ECF)
17
Percent of Lines
Perc
ent o
f Bug
s Fo
und
Percent of Lines
Perc
ent o
f Bug
s Fo
und
Percent of Lines
Perc
ent o
f Bug
s Fo
und
0 20 40 60 80 100
020
4060
8010
0
PackageFileMethod
![Page 18: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/18.jpg)
1000 Times Run (ECF)
18
Package File Method
Perc
ent o
f Bug
s Fo
und
020
4060
80
percentages of bugs found in 20% LOC on a 1,000 times run
![Page 19: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/19.jpg)
1000 Times Run (All)
19
0
25
50
75
100
Xpand WTP Incubator Ant Lucene/Solr OpenJPA Cassandra ECF Wicket
Per
cent
of b
ugs
foun
d
Package File Method
median values of the percentage of bugs found in 20% LOC
![Page 20: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/20.jpg)
Why Is Method-Level Prediction Effective?
20
Package File Method
0200
400
600
800
LOC
All Buggy
010
2030
4050
60N
umbe
r of m
etho
dsSize # of method in a file
Although models predict buggy modules correctly, they arelargely non-buggy in packages, or files.
![Page 21: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/21.jpg)
Observations from Correlation Analysis
21
Are there differences between method-level and package/file -level prediction models?
Same
Large changes tend to be buggy
Frequent changes tend to be buggy
DifferentBugs do not occur repeatedly
Organizational metrics may not contribute to method-level prediction
![Page 22: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/22.jpg)
Threats to Validity
22
Targets are limited to open-source written in Java projects
No manual inspection of identifying buggy modules
Effort-based evaluation may not reflect actual efforts
![Page 23: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/23.jpg)
Fine-Grained Study Is Big Data Analysis
Need scalable techniques
preparing fine-grained data (making Historage)
analyzing histories (collecting metrics)
building prediction models
23
0
7500
15000
22500
30000
Xpand Ant ECF Wicket
Files Methods
# of modules in one snapshot
![Page 24: Bug Prediction Based on Fine-Grained Module Histories](https://reader033.vdocuments.site/reader033/viewer/2022052905/55848b44d8b42a9f028b51b1/html5/thumbnails/24.jpg)
Conclusions
Summary
Method-level bug prediction with well-known historical metrics
Future work
Empirical studies of actual effort using method-level prediction
More metrics and more projects (including industrial projects)
24