esem2010 shihab
TRANSCRIPT
1
Understanding the Impact of Code and Process Metrics on Post-release Defects: A Case Study on
the Eclipse Project
Emad Shihab, Zhen Ming Jiang, Walid Ibrahim, Bram Adams and Ahmed E. Hassan
Software Analysis and Intelligence Lab (SAIL)Queen’s University
2
Motivation
Software has^ bugs and
managers have ^ limited resources
always
always
How to allocate quality assurance resources?
Q:
A: Defect prediction!
3
MotivationComplexity metrics, Program dependencies, socio-technical networks
Size is a good indicator of buggy files
Use dependency and complexity metrics
Use number of imports and code metrics
Use process and code metrics
University of Lugano
Change coupling,popularity and design flaws
Change complexity and social structures
Which metrics should I use? How do they impact my code quality?
4
The challenge we face …
1. more work to mine
2. difficult to understand impact
3. less adoption in practice
more metrics, means ….
5
Our goal ….
Use a statistical approach based on work by Cataldo et al. :
1. Narrow down large set of metrics to much smaller set
2. Study the impact on post-release defects
6
Our findings ….
Narrowed down 34 code and process metrics to only 3 or 4
Simple models achieve comparable predictive power
Explanative power of simple model outperform 95% PCA
Some metrics ALWAYS matter: Size and pre-bugs
Let me show you how ….
7
34 Code and Process MetricsMetric DescriptionPOST Number of post-release defects in a file in the 6 months after the release.
PRE Number of pre-release defects in a file in the 6 months before the release
TPC Total number of changes to a file in the 6 months before the release
BFC Number of bug fixing changes in a file in the 6 months before the release.TLOC Total number lines of code of a fileACD Number of anonymous type declarations in a file
FOUT (3) Number of method calls of a fileMLOC (3) Number of method lines of codeNBD (3) Nested block depth of the methods in a file
NOF (3) Number of fields of the classes in a fileNOI Number of interfaces in a fileNOM (3) Number of methods of the classes in a file
NOT Number of classes in a fileNSF (3) Number of static fields of the classes in a file
NSM (3) Number of static methods of the classes in a file
PAR (3) Number of parameters of the methods in a file
VG (3) McCabe cyclomatic complexity of the methods in a file
ProcessMetrics
Code Metrics
8
Approach overview
P < 0.1 VIF < 2.5
1. Build Logistic Regression model using all metrics2. Remove statistically insignificant metrics
3. Remove highly co-linear metrics4. Narrow down to a much smaller set of metrics
Initial model w/ all metrics
Statistical significance
check
Co-linearity check
Simple model
The std error of metric coefficient is ~1.6 times as large if metrics were uncorrelated5.2
9
Case study
Perform case study on Eclipse 2.0, 2.1 and 3.0
RQ1: Which metrics impact post-release defects? Do these metrics change for different releases of Eclipse?
RQ2: How much do metrics impact the post-release defects?Does the level of impact change across different releases?
10
Metric Eclipse 2.0 Eclipse 2.1 Eclipse 3.0
P-value VIF P-value VIF P-value VIF
Anonymous Type Declarations * 1.2
No. of Static Methods *** 1.1
No. of Parameters *** 1.2
No. Pre-release Defects *** 1.1 *** 1.1 *** 1.2
Total Prior Changes *** 1.1 ** 1.1
Total lines of Code *** 1.3 *** 1.4 *** 1.3
RQ1: Which metrics impact? Do they change?
Important and stable for all releases
Code metrics specific for release
(p<0.001 ***; p<0.001 **, p<0.05*)
11
RQ2: How much do metrics explain?Metric Eclipse 2.0 Eclipse 2.1 Eclipse 3.0Total lines of Code
Total Prior Changes
No. Pre-release defects
No. of Parameters
No. of static methods
Anonymous Type Declarations
Deviance explained 25.2% 17.7% 21.2%0.1%
4.9%
17.6%
2.2%
11.2%
6.3%
0.2%
14.5%
5.9%
0.5%
0.7%
Size and process metrics are most important
How well the model fits, explainsthe observed phenomena
12
RQ2: Impact of the metrics?Eclipse 3.0
Metric Odds-ratios (M 1)
Odds-ratios (M2)
Odds-ratios (M 3)
Odds-ratios (M 4)
Lines of Code 2.57 2.40 2.11 1.88
Prior Changes 1.87 1.62 1.62
Pre-release defects 1.87 1.90
Max parameters of methods
1.73
1 unit increase, increases the chance of post-release defect by 90%
Odds ratios are used to quantify impact on post-release defects
13
But …What about predictive power?Eclipse 3.0
Simple models achieve comparable results to more complex models
Precision (%) Recall (%) Accuracy (%)0
10
20
30
40
50
60
70
80
90
100
SimpleAll metrics
Perf
orm
ance
mea
sure
(%)
14
Comparing to PCAEclipse 3.0
Simple 95% PCA 99% PCA 100% PCA
Deviance explained
21.2% 16.3% 21.7% 22.0%
No. of metrics
4 33 33 33
No. of PCs - 8 15 33
Can outperform 95% PCA, using much simpler models
15
Comparing to PCA
Eclipse 2.0
Eclipse 2.1
Eclipse 3.0
0 5 10 15 20 25 30
100% PCA95% PCASimple
Deviance explained (%)
Outperform 95% PCA,slightly below 100% PCA
Use at most 4 metrics Vs. 34 metrics used in PCA
16
Conclusion
17
others
18
But …What about predictive power?
Eclipse 2.0 Eclipse 2.1 Eclipse 3.0
Ours All metrics Ours All metrics Ours All metrics
Precision (%) 66.3 63.6 60.0 58.6 64.1 64.7
Recall (%) 28.5 32.4 15.8 17.2 25.7 26.5
Accuracy (%) 87.5 87.5 89.8 89.8 86.9 87.0
Our simple models achieve comparable results to more complex models
19
Approach overview
P < 0.1 VIF < 2.5
1. Build Logistic Regression model using all metrics2. Remove statistically insignificant metrics
3. Remove highly co-linear metrics4. Narrow down to a much smaller set of metrics
The std error of metric coefficient is ~1.6 times as large if metrics were uncorrelated
20
A plethora of metrics on defect prediction
21
Our findings
1. Size and process metrics are most important
2. Odds ratios used to study impact
3. Simple models obtain comparable prediction power
4. Simple models outperform 95% PCA
22
Motivation
Software has^ bugs and
managers have ^ limited resources
always
always
23
Motivation
How to allocate quality assurance resources?Question:
Answer: Defect prediction!
Code Metrics vs. Process MetricsOpen Source vs. CommercialPre-release vs. Post-release vs. Both
24
And the work continues …
25
RQ1: which metrics? Do they change?
Using the p-value and the VIF measures, we are able to determine a small set of code and process metrics that impact post-release defects. These metrics change for different releases of Eclipse.
26
RQ1: which metrics? Do they change?
Important and stable for all releases
Code metrics specific for release
Metric Eclipse 2.0 Eclipse 2.1 Eclipse 3.0
P-value VIF P-value VIF P-value VIF
Anonymous Type Declarations * 1.2
No. of static methods *** 1.1
No. of Parameters *** 1.2
No. Pre-release defects *** 1.1 *** 1.1 *** 1.2
Total Prior Changes *** 1.1 ** 1.1
Total lines of Code *** 1.3 *** 1.4 *** 1.3
27
RQ2: How much do they impact? Does impact change?
28
But …What about prediction accuracy?Eclipse 2.0 Eclipse 2.1 Eclipse 3.0
Ours All metrics Ours All metrics Ours All metrics
Precision 66.3 63.6 60.0 58.6 64.1 64.7
Recall 28.5 32.4 15.8 17.2 25.7 26.5
Accuracy 87.5 87.5 89.8 89.8 86.9 87.0
29
What about using PCA?Ours 95% 99% 100%
Deviance explained
21.2% 16.3% 21.7% 22.0%
No. of metrics
4 33 33 33
No. of PCs - 8 15 33
30
Prediction accuracy?
Using a much smaller set of statistically significant and minimally collinear set of metrics does not significantly affect the prediction results of the logistic regression model.
31
RQ2: How much do they impact? Does impact change?
We are able to quantify the impact produced by the code and process metrics on post-release defects using odds ratios. The impact on post-release defects changes for different releases of Eclipse.
32
Comparing to PCA
Our models, built using a much smaller set of metrics, can achieve better explanative power than PCA-based models that explain 95% cumulative variation.
33
Example
No. metrics eliminated (P >0.1)
No. metrics eliminated (VIF>2.5)
No. metrics left
Iteration 0 0 0 34
Iteration 1 23 0 11
Iteration 2 2 0 9
…. …. …. ….
Iteration 3 0 1 5
Iteration 4 0 1 4
34
Example
No. metrics eliminated (P >0.1)
No. metrics eliminated (VIF>2.5)
No. metrics left
Iteration 0 0 0 34
Iteration 1 23 0 11
Iteration 2 2 0 9
…. …. …. ….
Iteration 7 0 1 5
Iteration 8 0 1 4
35
RQ2: How much do they impact? Does impact change?
Deviance explained: A measure of the explanative power of the model
Odds ratio: