esem2010 shihab

35
Understanding the Impact of Code and Process Metrics on Post-release Defects: A Case Study on the Eclipse Project Emad Shihab , Zhen Ming Jiang, Walid Ibrahim, Bram Adams and Ahmed E. Hassan Software Analysis and Intelligence Lab (SAIL) Queen’s University 1

Upload: sailqu

Post on 12-Apr-2017

141 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Esem2010 shihab

1

Understanding the Impact of Code and Process Metrics on Post-release Defects: A Case Study on

the Eclipse Project

Emad Shihab, Zhen Ming Jiang, Walid Ibrahim, Bram Adams and Ahmed E. Hassan

Software Analysis and Intelligence Lab (SAIL)Queen’s University

Page 2: Esem2010 shihab

2

Motivation

Software has^ bugs and

managers have ^ limited resources

always

always

How to allocate quality assurance resources?

Q:

A: Defect prediction!

Page 3: Esem2010 shihab

3

MotivationComplexity metrics, Program dependencies, socio-technical networks

Size is a good indicator of buggy files

Use dependency and complexity metrics

Use number of imports and code metrics

Use process and code metrics

University of Lugano

Change coupling,popularity and design flaws

Change complexity and social structures

Which metrics should I use? How do they impact my code quality?

Page 4: Esem2010 shihab

4

The challenge we face …

1. more work to mine

2. difficult to understand impact

3. less adoption in practice

more metrics, means ….

Page 5: Esem2010 shihab

5

Our goal ….

Use a statistical approach based on work by Cataldo et al. :

1. Narrow down large set of metrics to much smaller set

2. Study the impact on post-release defects

Page 6: Esem2010 shihab

6

Our findings ….

Narrowed down 34 code and process metrics to only 3 or 4

Simple models achieve comparable predictive power

Explanative power of simple model outperform 95% PCA

Some metrics ALWAYS matter: Size and pre-bugs

Let me show you how ….

Page 7: Esem2010 shihab

7

34 Code and Process MetricsMetric DescriptionPOST Number of post-release defects in a file in the 6 months after the release.

PRE Number of pre-release defects in a file in the 6 months before the release

TPC Total number of changes to a file in the 6 months before the release

BFC Number of bug fixing changes in a file in the 6 months before the release.TLOC Total number lines of code of a fileACD Number of anonymous type declarations in a file

FOUT (3) Number of method calls of a fileMLOC (3) Number of method lines of codeNBD (3) Nested block depth of the methods in a file

NOF (3) Number of fields of the classes in a fileNOI Number of interfaces in a fileNOM (3) Number of methods of the classes in a file

NOT Number of classes in a fileNSF (3) Number of static fields of the classes in a file

NSM (3) Number of static methods of the classes in a file

PAR (3) Number of parameters of the methods in a file

VG (3) McCabe cyclomatic complexity of the methods in a file

ProcessMetrics

Code Metrics

Page 8: Esem2010 shihab

8

Approach overview

P < 0.1 VIF < 2.5

1. Build Logistic Regression model using all metrics2. Remove statistically insignificant metrics

3. Remove highly co-linear metrics4. Narrow down to a much smaller set of metrics

Initial model w/ all metrics

Statistical significance

check

Co-linearity check

Simple model

The std error of metric coefficient is ~1.6 times as large if metrics were uncorrelated5.2

Page 9: Esem2010 shihab

9

Case study

Perform case study on Eclipse 2.0, 2.1 and 3.0

RQ1: Which metrics impact post-release defects? Do these metrics change for different releases of Eclipse?

RQ2: How much do metrics impact the post-release defects?Does the level of impact change across different releases?

Page 10: Esem2010 shihab

10

Metric Eclipse 2.0 Eclipse 2.1 Eclipse 3.0

P-value VIF P-value VIF P-value VIF

Anonymous Type Declarations * 1.2

No. of Static Methods *** 1.1

No. of Parameters *** 1.2

No. Pre-release Defects *** 1.1 *** 1.1 *** 1.2

Total Prior Changes *** 1.1 ** 1.1

Total lines of Code *** 1.3 *** 1.4 *** 1.3

RQ1: Which metrics impact? Do they change?

Important and stable for all releases

Code metrics specific for release

(p<0.001 ***; p<0.001 **, p<0.05*)

Page 11: Esem2010 shihab

11

RQ2: How much do metrics explain?Metric Eclipse 2.0 Eclipse 2.1 Eclipse 3.0Total lines of Code

Total Prior Changes

No. Pre-release defects

No. of Parameters

No. of static methods

Anonymous Type Declarations

Deviance explained 25.2% 17.7% 21.2%0.1%

4.9%

17.6%

2.2%

11.2%

6.3%

0.2%

14.5%

5.9%

0.5%

0.7%

Size and process metrics are most important

How well the model fits, explainsthe observed phenomena

Page 12: Esem2010 shihab

12

RQ2: Impact of the metrics?Eclipse 3.0

Metric Odds-ratios (M 1)

Odds-ratios (M2)

Odds-ratios (M 3)

Odds-ratios (M 4)

Lines of Code 2.57 2.40 2.11 1.88

Prior Changes 1.87 1.62 1.62

Pre-release defects 1.87 1.90

Max parameters of methods

1.73

1 unit increase, increases the chance of post-release defect by 90%

Odds ratios are used to quantify impact on post-release defects

Page 13: Esem2010 shihab

13

But …What about predictive power?Eclipse 3.0

Simple models achieve comparable results to more complex models

Precision (%) Recall (%) Accuracy (%)0

10

20

30

40

50

60

70

80

90

100

SimpleAll metrics

Perf

orm

ance

mea

sure

(%)

Page 14: Esem2010 shihab

14

Comparing to PCAEclipse 3.0

Simple 95% PCA 99% PCA 100% PCA

Deviance explained

21.2% 16.3% 21.7% 22.0%

No. of metrics

4 33 33 33

No. of PCs - 8 15 33

Can outperform 95% PCA, using much simpler models

Page 15: Esem2010 shihab

15

Comparing to PCA

Eclipse 2.0

Eclipse 2.1

Eclipse 3.0

0 5 10 15 20 25 30

100% PCA95% PCASimple

Deviance explained (%)

Outperform 95% PCA,slightly below 100% PCA

Use at most 4 metrics Vs. 34 metrics used in PCA

Page 16: Esem2010 shihab

16

Conclusion

Page 17: Esem2010 shihab

17

others

Page 18: Esem2010 shihab

18

But …What about predictive power?

Eclipse 2.0 Eclipse 2.1 Eclipse 3.0

Ours All metrics Ours All metrics Ours All metrics

Precision (%) 66.3 63.6 60.0 58.6 64.1 64.7

Recall (%) 28.5 32.4 15.8 17.2 25.7 26.5

Accuracy (%) 87.5 87.5 89.8 89.8 86.9 87.0

Our simple models achieve comparable results to more complex models

Page 19: Esem2010 shihab

19

Approach overview

P < 0.1 VIF < 2.5

1. Build Logistic Regression model using all metrics2. Remove statistically insignificant metrics

3. Remove highly co-linear metrics4. Narrow down to a much smaller set of metrics

The std error of metric coefficient is ~1.6 times as large if metrics were uncorrelated

Page 20: Esem2010 shihab

20

A plethora of metrics on defect prediction

Page 21: Esem2010 shihab

21

Our findings

1. Size and process metrics are most important

2. Odds ratios used to study impact

3. Simple models obtain comparable prediction power

4. Simple models outperform 95% PCA

Page 22: Esem2010 shihab

22

Motivation

Software has^ bugs and

managers have ^ limited resources

always

always

Page 23: Esem2010 shihab

23

Motivation

How to allocate quality assurance resources?Question:

Answer: Defect prediction!

Code Metrics vs. Process MetricsOpen Source vs. CommercialPre-release vs. Post-release vs. Both

Page 24: Esem2010 shihab

24

And the work continues …

Page 25: Esem2010 shihab

25

RQ1: which metrics? Do they change?

Using the p-value and the VIF measures, we are able to determine a small set of code and process metrics that impact post-release defects. These metrics change for different releases of Eclipse.

Page 26: Esem2010 shihab

26

RQ1: which metrics? Do they change?

Important and stable for all releases

Code metrics specific for release

Metric Eclipse 2.0 Eclipse 2.1 Eclipse 3.0

P-value VIF P-value VIF P-value VIF

Anonymous Type Declarations * 1.2

No. of static methods *** 1.1

No. of Parameters *** 1.2

No. Pre-release defects *** 1.1 *** 1.1 *** 1.2

Total Prior Changes *** 1.1 ** 1.1

Total lines of Code *** 1.3 *** 1.4 *** 1.3

Page 27: Esem2010 shihab

27

RQ2: How much do they impact? Does impact change?

Page 28: Esem2010 shihab

28

But …What about prediction accuracy?Eclipse 2.0 Eclipse 2.1 Eclipse 3.0

Ours All metrics Ours All metrics Ours All metrics

Precision 66.3 63.6 60.0 58.6 64.1 64.7

Recall 28.5 32.4 15.8 17.2 25.7 26.5

Accuracy 87.5 87.5 89.8 89.8 86.9 87.0

Page 29: Esem2010 shihab

29

What about using PCA?Ours 95% 99% 100%

Deviance explained

21.2% 16.3% 21.7% 22.0%

No. of metrics

4 33 33 33

No. of PCs - 8 15 33

Page 30: Esem2010 shihab

30

Prediction accuracy?

Using a much smaller set of statistically significant and minimally collinear set of metrics does not significantly affect the prediction results of the logistic regression model.

Page 31: Esem2010 shihab

31

RQ2: How much do they impact? Does impact change?

We are able to quantify the impact produced by the code and process metrics on post-release defects using odds ratios. The impact on post-release defects changes for different releases of Eclipse.

Page 32: Esem2010 shihab

32

Comparing to PCA

Our models, built using a much smaller set of metrics, can achieve better explanative power than PCA-based models that explain 95% cumulative variation.

Page 33: Esem2010 shihab

33

Example

No. metrics eliminated (P >0.1)

No. metrics eliminated (VIF>2.5)

No. metrics left

Iteration 0 0 0 34

Iteration 1 23 0 11

Iteration 2 2 0 9

…. …. …. ….

Iteration 3 0 1 5

Iteration 4 0 1 4

Page 34: Esem2010 shihab

34

Example

No. metrics eliminated (P >0.1)

No. metrics eliminated (VIF>2.5)

No. metrics left

Iteration 0 0 0 34

Iteration 1 23 0 11

Iteration 2 2 0 9

…. …. …. ….

Iteration 7 0 1 5

Iteration 8 0 1 4

Page 35: Esem2010 shihab

35

RQ2: How much do they impact? Does impact change?

Deviance explained: A measure of the explanative power of the model

Odds ratio: