esem2010 shihab

1

Understanding the Impact of Code and Process Metrics on Post-release Defects: A Case Study on

the Eclipse Project

Emad Shihab, Zhen Ming Jiang, Walid Ibrahim, Bram Adams and Ahmed E. Hassan

Software Analysis and Intelligence Lab (SAIL)Queen’s University

2

Motivation

Software has^ bugs and

managers have ^ limited resources

always

always

How to allocate quality assurance resources?

Q:

A: Defect prediction!

3

MotivationComplexity metrics, Program dependencies, socio-technical networks

Size is a good indicator of buggy files

Use dependency and complexity metrics

Use number of imports and code metrics

Use process and code metrics

University of Lugano

Change coupling,popularity and design flaws

Change complexity and social structures

Which metrics should I use? How do they impact my code quality?

4

The challenge we face …

1. more work to mine

2. difficult to understand impact

3. less adoption in practice

more metrics, means ….

5

Our goal ….

Use a statistical approach based on work by Cataldo et al. :

1. Narrow down large set of metrics to much smaller set

2. Study the impact on post-release defects

6

Our findings ….

Narrowed down 34 code and process metrics to only 3 or 4

Simple models achieve comparable predictive power

Explanative power of simple model outperform 95% PCA

Some metrics ALWAYS matter: Size and pre-bugs

Let me show you how ….

7

34 Code and Process MetricsMetric DescriptionPOST Number of post-release defects in a file in the 6 months after the release.

PRE Number of pre-release defects in a file in the 6 months before the release

TPC Total number of changes to a file in the 6 months before the release

BFC Number of bug fixing changes in a file in the 6 months before the release.TLOC Total number lines of code of a fileACD Number of anonymous type declarations in a file

FOUT (3) Number of method calls of a fileMLOC (3) Number of method lines of codeNBD (3) Nested block depth of the methods in a file

NOF (3) Number of fields of the classes in a fileNOI Number of interfaces in a fileNOM (3) Number of methods of the classes in a file

NOT Number of classes in a fileNSF (3) Number of static fields of the classes in a file

NSM (3) Number of static methods of the classes in a file

PAR (3) Number of parameters of the methods in a file

VG (3) McCabe cyclomatic complexity of the methods in a file

ProcessMetrics

Code Metrics

8

Approach overview

P < 0.1 VIF < 2.5

1. Build Logistic Regression model using all metrics2. Remove statistically insignificant metrics

3. Remove highly co-linear metrics4. Narrow down to a much smaller set of metrics

Initial model w/ all metrics

Statistical significance

check

Co-linearity check

Simple model

The std error of metric coefficient is ~1.6 times as large if metrics were uncorrelated5.2

9

Case study

Perform case study on Eclipse 2.0, 2.1 and 3.0

RQ1: Which metrics impact post-release defects? Do these metrics change for different releases of Eclipse?

RQ2: How much do metrics impact the post-release defects?Does the level of impact change across different releases?

10

Metric Eclipse 2.0 Eclipse 2.1 Eclipse 3.0

P-value VIF P-value VIF P-value VIF

Anonymous Type Declarations * 1.2

No. of Static Methods *** 1.1

No. of Parameters *** 1.2

No. Pre-release Defects *** 1.1 *** 1.1 *** 1.2

Total Prior Changes *** 1.1 ** 1.1

Total lines of Code *** 1.3 *** 1.4 *** 1.3

RQ1: Which metrics impact? Do they change?

Important and stable for all releases

Code metrics specific for release

(p<0.001 ***; p<0.001 **, p<0.05*)

11

RQ2: How much do metrics explain?Metric Eclipse 2.0 Eclipse 2.1 Eclipse 3.0Total lines of Code

Total Prior Changes

No. Pre-release defects

No. of Parameters

No. of static methods

Anonymous Type Declarations

Deviance explained 25.2% 17.7% 21.2%0.1%

4.9%

17.6%

2.2%

11.2%

6.3%

0.2%

14.5%

5.9%

0.5%

0.7%

Size and process metrics are most important

How well the model fits, explainsthe observed phenomena

12

RQ2: Impact of the metrics?Eclipse 3.0

Metric Odds-ratios (M 1)

Odds-ratios (M2)

Odds-ratios (M 3)

Odds-ratios (M 4)

Lines of Code 2.57 2.40 2.11 1.88

Prior Changes 1.87 1.62 1.62

Pre-release defects 1.87 1.90

Max parameters of methods

1.73

1 unit increase, increases the chance of post-release defect by 90%

Odds ratios are used to quantify impact on post-release defects

13

But …What about predictive power?Eclipse 3.0

Simple models achieve comparable results to more complex models

Precision (%) Recall (%) Accuracy (%)0

10

20

30

40

50

60

70

80

90

100

SimpleAll metrics

Perf

orm

ance

mea

sure

(%)

14

Comparing to PCAEclipse 3.0

Simple 95% PCA 99% PCA 100% PCA

Deviance explained

21.2% 16.3% 21.7% 22.0%

No. of metrics

4 33 33 33

No. of PCs - 8 15 33

Can outperform 95% PCA, using much simpler models

15

Comparing to PCA

Eclipse 2.0

Eclipse 2.1

Eclipse 3.0

0 5 10 15 20 25 30

100% PCA95% PCASimple

Deviance explained (%)

Outperform 95% PCA,slightly below 100% PCA

Use at most 4 metrics Vs. 34 metrics used in PCA

16

Conclusion

17

others

18

But …What about predictive power?

Eclipse 2.0 Eclipse 2.1 Eclipse 3.0

Ours All metrics Ours All metrics Ours All metrics

Precision (%) 66.3 63.6 60.0 58.6 64.1 64.7

Recall (%) 28.5 32.4 15.8 17.2 25.7 26.5

Accuracy (%) 87.5 87.5 89.8 89.8 86.9 87.0

Our simple models achieve comparable results to more complex models

19

Approach overview

P < 0.1 VIF < 2.5

1. Build Logistic Regression model using all metrics2. Remove statistically insignificant metrics

3. Remove highly co-linear metrics4. Narrow down to a much smaller set of metrics

The std error of metric coefficient is ~1.6 times as large if metrics were uncorrelated

20

A plethora of metrics on defect prediction

21

Our findings

1. Size and process metrics are most important

2. Odds ratios used to study impact

3. Simple models obtain comparable prediction power

4. Simple models outperform 95% PCA

22

Motivation

Software has^ bugs and

managers have ^ limited resources

always

always

23

Motivation

How to allocate quality assurance resources?Question:

Answer: Defect prediction!

Code Metrics vs. Process MetricsOpen Source vs. CommercialPre-release vs. Post-release vs. Both

24

And the work continues …

25

RQ1: which metrics? Do they change?

Using the p-value and the VIF measures, we are able to determine a small set of code and process metrics that impact post-release defects. These metrics change for different releases of Eclipse.

26

RQ1: which metrics? Do they change?

Important and stable for all releases

Code metrics specific for release

Metric Eclipse 2.0 Eclipse 2.1 Eclipse 3.0

P-value VIF P-value VIF P-value VIF

Anonymous Type Declarations * 1.2

No. of static methods *** 1.1

No. of Parameters *** 1.2

No. Pre-release defects *** 1.1 *** 1.1 *** 1.2

Total Prior Changes *** 1.1 ** 1.1

Total lines of Code *** 1.3 *** 1.4 *** 1.3

27

RQ2: How much do they impact? Does impact change?

28

But …What about prediction accuracy?Eclipse 2.0 Eclipse 2.1 Eclipse 3.0

Ours All metrics Ours All metrics Ours All metrics

Precision 66.3 63.6 60.0 58.6 64.1 64.7

Recall 28.5 32.4 15.8 17.2 25.7 26.5

Accuracy 87.5 87.5 89.8 89.8 86.9 87.0

29

What about using PCA?Ours 95% 99% 100%

Deviance explained

21.2% 16.3% 21.7% 22.0%

No. of metrics

4 33 33 33

No. of PCs - 8 15 33

30

Prediction accuracy?

Using a much smaller set of statistically significant and minimally collinear set of metrics does not significantly affect the prediction results of the logistic regression model.

31


We are able to quantify the impact produced by the code and process metrics on post-release defects using odds ratios. The impact on post-release defects changes for different releases of Eclipse.

32

Comparing to PCA

Our models, built using a much smaller set of metrics, can achieve better explanative power than PCA-based models that explain 95% cumulative variation.

33

Example

No. metrics eliminated (P >0.1)

No. metrics eliminated (VIF>2.5)

No. metrics left

Iteration 0 0 0 34

Iteration 1 23 0 11

Iteration 2 2 0 9

…. …. …. ….

Iteration 3 0 1 5

Iteration 4 0 1 4

34

Example

No. metrics eliminated (P >0.1)

No. metrics eliminated (VIF>2.5)

No. metrics left

Iteration 0 0 0 34

Iteration 1 23 0 11

Iteration 2 2 0 9

…. …. …. ….

Iteration 7 0 1 5

Iteration 8 0 1 4

35


Deviance explained: A measure of the explanative power of the model

Odds ratio:

esem2010 shihab

Documents