towards building a universal defect prediction model

Towards Building a Universal Defect Prediction Model

Feng Zhang

Audris Mockus

Iman Keivanloo

Ying Zou

2

ONE ring that rules the other rings of power.

3

A universal model that predicts defects for all the projects.

4

Most successful prediction models are within-project models

5

How about cross-project models?

6

Deriving a universal model with cross-project models?

7

Select the training set of projects like this?

8

Or select the training set of projects like this?

9

Is it still possible to build a universal model? If so, then how?

10

What context factors to consider ?

11

C++

S

C++

L

Java

S

Java

L

Steps towards building a universal model 1. Partition

C++ Java Small size

Large size

Programming languages System Size

12

C++

S

C++

L

Java

S

Java

L

Steps towards building a universal model 1. Partition

C++

S

C++

L

Java

2. Cluster

R1(x)

R1(x)

R3(x)

3. Obtain Ranking Functions

4. Rank

Using quantiles of metric values (- ∞, 10%] => level 1 (10%, 20%] => level 2

… [90%, +∞) => level 10

Java

S

Java

L

Java

13

C++

S

C++

L

Java

S

Java

L

Build a universal model 1. Partition

C++

S

C++

L

Java

2. Cluster

R1(x)

R1(x)

R3(x)

3. Obtain Ranking Functions

4. Rank

Build a universal defect prediction model using rank-transformed values.

14

Case study setup

937

461

0 200 400 600 800

Version Control System

0

200

400

600

800

1000

Using Not Using

Issue Tracking System

0

200

400

600

800

Programming languages

15

Research Questions

16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Precision Recall AUC

Rank Transformation

Log Transformation

0.48 0.48 0.57

0.58 0.62

0.61

RQ1. Is our rank transformation good ?

17

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Precision Recall AUC

Universal Model

Within-project Model

0.45 0.48

0.58 0.63 0.64

0.62

RQ2. How good is the universal model ?

18

RQ3. Does the universal model work for external projects ?

Predict

19

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Eclipse Equinox PDE Mylyn Lucene

Universal Model

Within-project Model 0.31

0.47

0.63 0.66

0.21

0.13

Precision

0.23 0.28

0.23 0.28

RQ3. Precision comparison

20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


Universal Model


0.57

0.79

0.54 0.61 0.61

0.34

Recall

0.47

0.72

0.42

0.60

RQ3. Recall comparison

21

0.6 0.62 0.64 0.66 0.68

0.7 0.72 0.74 0.76 0.78

0.8


Universal Model


0.76 0.77 0.78

0.79

0.69 0.67

AUC

0.70 0.70 0.68

0.69

RQ3. AUC comparison

22

Summary

towards building a universal defect prediction model

Software