personalized defect prediction
DESCRIPTION
Tian's ASE 2013 PresentationTRANSCRIPT
PersonalizedDefect Prediction
Tian Jiang Lin Tan Sunghun KimUniversity of
WaterlooHong Kong University of Science and Technology
University of Waterloo
1
How to Find Bugs?
• Code Review
• Testing
• Static Analysis
• Dynamic Analysis
• Verification
• Defect Prediction
2
2
Defect Prediction
3
SoftwareHistory
PredictorFuture Defect
3
Developers are Different
4
4
Developers are Different
4
0
20
40
60
80
A B C D Average
% o
f Bug
gy C
hang
es
Modulo % FOR Bitwise OR CONTINUE
Linux Kernel, 2005-2010
4
Developers are Different
4
0
20
40
60
80
A B C D Average
% o
f Bug
gy C
hang
es
Modulo % FOR Bitwise OR CONTINUE
Linux Kernel, 2005-2010
4
Developers are Different
4
0
20
40
60
80
A B C D Average
% o
f Bug
gy C
hang
es
Modulo % FOR Bitwise OR CONTINUE
Linux Kernel, 2005-2010
Personalized models can improve performance.
4
Successes in Other Fields
5
5
Successes in Other Fields
• Google personalized search
5
5
Successes in Other Fields
• Google personalized search
• Facebook personalized ad placement
5
5
Contributions
6
6
Contributions
• Personalized Change Classification (PCC)
✦ One model for each developer
6
6
Contributions
• Personalized Change Classification (PCC)
✦ One model for each developer
• Confidence-based Hybrid PCC (PCC+)
✦ Picks predictions with highest confidence
6
6
Contributions
• Personalized Change Classification (PCC)
✦ One model for each developer
• Confidence-based Hybrid PCC (PCC+)
✦ Picks predictions with highest confidence
• Evaluate on six C and Java projects
✦ Find up to 155 more bugs by inspecting 20% LOC
✦ Improve F1 by up to 0.08
6
6
What is a Change?
7
7
What is a Change?
7
Commit: 09a02f...Author: John SmithMessage: I submitted some code.
file1.c+++-
file2.c+-----
file3.c++---
7
What is a Change?
7
Commit: 09a02f...Author: John SmithMessage: I submitted some code.
file1.c+++-
file2.c+-----
file3.c++---
Commit
Change 1 Change 2 Change 3
7
What is a Change?
7
Commit: 09a02f...Author: John SmithMessage: I submitted some code.
file1.c+++-
file2.c+-----
file3.c++---
Commit
Change 1 Change 2 Change 3
Change-Level: Inspect less code to locate a bug.
7
Change Classification (CC)
8
8
Change Classification (CC)
8
Software History
Training Phase Prediction Phase
8
Change Classification (CC)
8
Software History
Training Instances
Training Phase Prediction Phase
1. Label changes with clean or buggy
8
Change Classification (CC)
8
Software History
Training Instances
Features
Training Phase Prediction Phase
1. Label changes with clean or buggy
2. Extract features
8
Change Classification (CC)
8
Software History
Training Instances
Features Classification Algorithm
Model
Training Phase Prediction Phase
1. Label changes with clean or buggy
2. Extract features
3. Build prediction model
8
Change Classification (CC)
8
Software History
Training Instances
Features Classification Algorithm
ModelFuture
Instances
Training Phase Prediction Phase
1. Label changes with clean or buggy
2. Extract features
3. Build prediction model
4. Predict
8
Label Clean or Buggy
9
9
Label Clean or Buggy
9
Revision History
[Sliwerski et al. ’05]
9
Label Clean or Buggy
9
Commit: 1da57...Message: I fixed a bugfileA.c
- if (i < 128)+if (i <= 128)
Bug-Fixing Change
Revision History
[Sliwerski et al. ’05]
Contain keyword “fix”, orID of manually verified bug report [Herzif et al. ’13]
9
Label Clean or Buggy
9
Commit: 1da57...Message: I fixed a bugfileA.c
- if (i < 128)+if (i <= 128)
Commit: 7a3bc...Message: new featurefileA.c+...+if (i < 128)+...
git blame
Bug-Fixing ChangeBuggy Change
Revision History
[Sliwerski et al. ’05]
Contain keyword “fix”, orID of manually verified bug report [Herzif et al. ’13]Fixed by a later change
9
Three Types of Features
10
10
Three Types of Features
• Metadata
• Bag-of-Words
• Characteristic Vector
10
10
Characteristic Vector
11
11
Characteristic Vector
11
Count Abstract Syntax Tree (AST) nodes
11
Characteristic Vector
for (...; ...; ...) { for (...; ...; ...) { if (...) ...; } }
11
Count Abstract Syntax Tree (AST) nodes
11
Characteristic Vector
for (...; ...; ...) { for (...; ...; ...) { if (...) ...; } }
for:if:while:...
11
Count Abstract Syntax Tree (AST) nodes
11
Characteristic Vector
for (...; ...; ...) { for (...; ...; ...) { if (...) ...; } }
for:if:while:...
11
210
Count Abstract Syntax Tree (AST) nodes
11
CC: Training
12
12
CC: Training
12
Training Instances Model
12
CC: Training
12
Training Instances Model
12
CC: Prediction
13
UnlabeledChanges
13
CC: Prediction
13
ModelUnlabeledChanges
PredictedChanges
13
PCC: Training
14
14
PCC: Training
14
Training Instances
14
PCC: Training
14
Group Changes by Developer
Training Instances
Dev 1
Dev 2
Dev 3
14
PCC: Training
14
Group Changes by Developer Training
Training Instances
Dev 1
Dev 2
Dev 3
Model 1
Model 2
Model 3
14
PCC: Prediction
15
Model 1
Model 2
Model 3
15
PCC: Prediction
15
Choose a Model by Developer
(Dev 2)
Model 1
Model 2
Model 3
15
PCC: Prediction
15
Choose a Model by Developer Prediction
(Dev 2)
Model 1
Model 2
Model 3
15
PCC+: Prediction
16
16
PCC+: Prediction
16
Prediction
CC
PCC
Com
biner
Feed Changes to All Models
16
Confidence Measure
17
17
Confidence Measure
• Bugginess
✦ Probability of a change being buggy
17
17
Confidence Measure
• Bugginess
✦ Probability of a change being buggy
• Confidence Measure
✦ Comparable measure of confidence
17
17
Confidence Measure
• Bugginess
✦ Probability of a change being buggy
• Confidence Measure
✦ Comparable measure of confidence
• Select the prediction with the highest confidence.
17
17
Research Questions
18
18
Research Questions
• RQ1: Do PCC and PCC+ outperform CC?
18
18
Research Questions
• RQ1: Do PCC and PCC+ outperform CC?
• RQ2: Does PCC outperform CC in other setups?
✦ Classification algorithms
✦ Sizes of training sets
18
18
Two Metrics
19
19
Two Metrics
• F1-Score
✦ Harmonic mean of precision and recall
19
19
Two Metrics
• F1-Score
✦ Harmonic mean of precision and recall
• Cost Effectiveness
✦ Relevant in cost sensitive scenarios
✦ NofB20: Number of Bugs discovered by inspecting top 20% lines of code
19
19
Cost EffectivenessCumulative LOC Changes LOC
10% Buggy #1 10
15% Buggy #2 5
19% Buggy #3 4
27% Buggy #4 8
Buggy #5 12
... ...
100
20
20
Cost EffectivenessCumulative LOC Changes LOC
10% Buggy #1 10
15% Buggy #2 5
19% Buggy #3 4
27% Buggy #4 8
Buggy #5 12
... ...
100
20
20
Cost EffectivenessCumulative LOC Changes LOC
10% Buggy #1 10
15% Buggy #2 5
19% Buggy #3 4
27% Buggy #4 8
Buggy #5 12
... ...
100
20
20
Cost EffectivenessCumulative LOC Changes LOC
10% Buggy #1 10
15% Buggy #2 5
19% Buggy #3 4
27% Buggy #4 8
Buggy #5 12
... ...
100
20
20
Cost EffectivenessCumulative LOC Changes LOC
10% Buggy #1 10
15% Buggy #2 5
19% Buggy #3 4
27% Buggy #4 8
Buggy #5 12
... ...
100
20
20
Cost EffectivenessCumulative LOC Changes LOC
10% Buggy #1 10
15% Buggy #2 5
19% Buggy #3 4
27% Buggy #4 8
Buggy #5 12
... ...
100
20
20
Cost EffectivenessCumulative LOC Changes LOC
10% Buggy #1 10
15% Buggy #2 5
19% Buggy #3 4
27% Buggy #4 8
Buggy #5 12
... ...
100
21
21
Cost EffectivenessCumulative LOC Changes LOC
10% Buggy #1 10
15% Buggy #2 5
19% Buggy #3 4
27% Buggy #4 8
Buggy #5 12
... ...
100
21
True Bug
True Bug
True Bug
NofB20=3
21
Test Subjects
22
Projects Language LOC # of Changes
Linux kernel C 7.3M 429K
PostgreSQL C 289K 89K
Xorg C 1.1M 46K
Eclipse Java 1.5M 73K
Lucene* Java 828K 76K
Jackrabbit* Java 589K 61K
* With manually labelled bug report data [Herzif et al. ’13]
22
PCC/PCC+ vs. CC
23
Decision Tree, NofB20
23
PCC/PCC+ vs. CC
23
Projects CC PCC Delta PCC+ Delta
Linux 160 179 +19 172 +12PostgreSQL 55 210 +155 175 +120
Xorg 96 159 +63 161 +65Eclipse 116 207 +91 200 +84Lucene 177 254 +77 257 +80
Jackrabbit 411 449 +38 459 +48Average - - +74 - +68
Statistical significant deltas are in bold.
Decision Tree, NofB20
23
PCC/PCC+ outperforms CC.
24
24
25
ProjectsNaive BayesNaive BayesNaive Bayes Logistic RegressionLogistic RegressionLogistic Regression
ProjectsCC PCC Delta CC PCC Delta
Linux 138 147 +9 102 137 +35
PostgreSQL 89 113 +24 46 56 +10
Xorg 84 101 +17 52 29 -23
Eclipse 65 108 +43 54 55 +1
Lucene 152 139 -13 30 200 +170
Jackrabbit 420 414 -6 261 370 +109
Average - - +12 - - +59Statistical significant deltas are in bold.
NofB20Different Classification Alg.
25
25
ProjectsNaive BayesNaive BayesNaive Bayes Logistic RegressionLogistic RegressionLogistic Regression
ProjectsCC PCC Delta CC PCC Delta
Linux 138 147 +9 102 137 +35
PostgreSQL 89 113 +24 46 56 +10
Xorg 84 101 +17 52 29 -23
Eclipse 65 108 +43 54 55 +1
Lucene 152 139 -13 30 200 +170
Jackrabbit 420 414 -6 261 370 +109
Average - - +12 - - +59Statistical significant deltas are in bold.
NofB20Different Classification Alg.
25
Different Training Set Sizes
26
PCC CC
100
150
200
250
300
10 20 30 40 50 60 70 80 90
Nof
B20
Training Set Size Per Developer
26
Different Training Set Sizes
26
PCC CC
100
150
200
250
300
10 20 30 40 50 60 70 80 90
Nof
B20
Training Set Size Per Developer
26
The improvement presents in other setups.
27
27
Related Work
• Kim et al., Classifying software changes: Clean or buggy?, TSE ’08
• Bettenburg et al., Think locally, act globally: Improving defect and effort prediction models, MSR ’12
28
28
Conclusions & Future Work
• PCC and PCC+ improve prediction performance.
• The improvement presents in other setups.
• Personalized approach can be applied to other fields.
✦ Recommendation systems
✦ Vulnerability prediction
✦ Top crashes prediction
29
29