personalized defect prediction

78
Personalized Defect Prediction Tian Jiang Lin Tan Sunghun Kim University of Waterloo Hong Kong University of Science and Technology University of Waterloo 1

Upload: sung-kim

Post on 25-Jun-2015

1.594 views

Category:

Technology


1 download

DESCRIPTION

Tian's ASE 2013 Presentation

TRANSCRIPT

Page 1: Personalized Defect Prediction

PersonalizedDefect Prediction

Tian Jiang Lin Tan Sunghun KimUniversity of

WaterlooHong Kong University of Science and Technology

University of Waterloo

1

Page 2: Personalized Defect Prediction

How to Find Bugs?

• Code Review

• Testing

• Static Analysis

• Dynamic Analysis

• Verification

• Defect Prediction

2

2

Page 3: Personalized Defect Prediction

Defect Prediction

3

SoftwareHistory

PredictorFuture Defect

3

Page 4: Personalized Defect Prediction

Developers are Different

4

4

Page 5: Personalized Defect Prediction

Developers are Different

4

0

20

40

60

80

A B C D Average

% o

f Bug

gy C

hang

es

Modulo % FOR Bitwise OR CONTINUE

Linux Kernel, 2005-2010

4

Page 6: Personalized Defect Prediction

Developers are Different

4

0

20

40

60

80

A B C D Average

% o

f Bug

gy C

hang

es

Modulo % FOR Bitwise OR CONTINUE

Linux Kernel, 2005-2010

4

Page 7: Personalized Defect Prediction

Developers are Different

4

0

20

40

60

80

A B C D Average

% o

f Bug

gy C

hang

es

Modulo % FOR Bitwise OR CONTINUE

Linux Kernel, 2005-2010

Personalized models can improve performance.

4

Page 8: Personalized Defect Prediction

Successes in Other Fields

5

5

Page 9: Personalized Defect Prediction

Successes in Other Fields

• Google personalized search

5

5

Page 10: Personalized Defect Prediction

Successes in Other Fields

• Google personalized search

• Facebook personalized ad placement

5

5

Page 11: Personalized Defect Prediction

Contributions

6

6

Page 12: Personalized Defect Prediction

Contributions

• Personalized Change Classification (PCC)

✦ One model for each developer

6

6

Page 13: Personalized Defect Prediction

Contributions

• Personalized Change Classification (PCC)

✦ One model for each developer

• Confidence-based Hybrid PCC (PCC+)

✦ Picks predictions with highest confidence

6

6

Page 14: Personalized Defect Prediction

Contributions

• Personalized Change Classification (PCC)

✦ One model for each developer

• Confidence-based Hybrid PCC (PCC+)

✦ Picks predictions with highest confidence

• Evaluate on six C and Java projects

✦ Find up to 155 more bugs by inspecting 20% LOC

✦ Improve F1 by up to 0.08

6

6

Page 15: Personalized Defect Prediction

What is a Change?

7

7

Page 16: Personalized Defect Prediction

What is a Change?

7

Commit: 09a02f...Author: John SmithMessage: I submitted some code.

file1.c+++-

file2.c+-----

file3.c++---

7

Page 17: Personalized Defect Prediction

What is a Change?

7

Commit: 09a02f...Author: John SmithMessage: I submitted some code.

file1.c+++-

file2.c+-----

file3.c++---

Commit

Change 1 Change 2 Change 3

7

Page 18: Personalized Defect Prediction

What is a Change?

7

Commit: 09a02f...Author: John SmithMessage: I submitted some code.

file1.c+++-

file2.c+-----

file3.c++---

Commit

Change 1 Change 2 Change 3

Change-Level: Inspect less code to locate a bug.

7

Page 19: Personalized Defect Prediction

Change Classification (CC)

8

8

Page 20: Personalized Defect Prediction

Change Classification (CC)

8

Software History

Training Phase Prediction Phase

8

Page 21: Personalized Defect Prediction

Change Classification (CC)

8

Software History

Training Instances

Training Phase Prediction Phase

1. Label changes with clean or buggy

8

Page 22: Personalized Defect Prediction

Change Classification (CC)

8

Software History

Training Instances

Features

Training Phase Prediction Phase

1. Label changes with clean or buggy

2. Extract features

8

Page 23: Personalized Defect Prediction

Change Classification (CC)

8

Software History

Training Instances

Features Classification Algorithm

Model

Training Phase Prediction Phase

1. Label changes with clean or buggy

2. Extract features

3. Build prediction model

8

Page 24: Personalized Defect Prediction

Change Classification (CC)

8

Software History

Training Instances

Features Classification Algorithm

ModelFuture

Instances

Training Phase Prediction Phase

1. Label changes with clean or buggy

2. Extract features

3. Build prediction model

4. Predict

8

Page 25: Personalized Defect Prediction

Label Clean or Buggy

9

9

Page 26: Personalized Defect Prediction

Label Clean or Buggy

9

Revision History

[Sliwerski et al. ’05]

9

Page 27: Personalized Defect Prediction

Label Clean or Buggy

9

Commit: 1da57...Message: I fixed a bugfileA.c

- if (i < 128)+if (i <= 128)

Bug-Fixing Change

Revision History

[Sliwerski et al. ’05]

Contain keyword “fix”, orID of manually verified bug report [Herzif et al. ’13]

9

Page 28: Personalized Defect Prediction

Label Clean or Buggy

9

Commit: 1da57...Message: I fixed a bugfileA.c

- if (i < 128)+if (i <= 128)

Commit: 7a3bc...Message: new featurefileA.c+...+if (i < 128)+...

git blame

Bug-Fixing ChangeBuggy Change

Revision History

[Sliwerski et al. ’05]

Contain keyword “fix”, orID of manually verified bug report [Herzif et al. ’13]Fixed by a later change

9

Page 29: Personalized Defect Prediction

Three Types of Features

10

10

Page 30: Personalized Defect Prediction

Three Types of Features

• Metadata

• Bag-of-Words

• Characteristic Vector

10

10

Page 31: Personalized Defect Prediction

Characteristic Vector

11

11

Page 32: Personalized Defect Prediction

Characteristic Vector

11

Count Abstract Syntax Tree (AST) nodes

11

Page 33: Personalized Defect Prediction

Characteristic Vector

for (...; ...; ...) { for (...; ...; ...) { if (...) ...; } }

11

Count Abstract Syntax Tree (AST) nodes

11

Page 36: Personalized Defect Prediction

CC: Training

12

12

Page 37: Personalized Defect Prediction

CC: Training

12

Training Instances Model

12

Page 38: Personalized Defect Prediction

CC: Training

12

Training Instances Model

12

Page 39: Personalized Defect Prediction

CC: Prediction

13

UnlabeledChanges

13

Page 40: Personalized Defect Prediction

CC: Prediction

13

ModelUnlabeledChanges

PredictedChanges

13

Page 41: Personalized Defect Prediction

PCC: Training

14

14

Page 42: Personalized Defect Prediction

PCC: Training

14

Training Instances

14

Page 43: Personalized Defect Prediction

PCC: Training

14

Group Changes by Developer

Training Instances

Dev 1

Dev 2

Dev 3

14

Page 44: Personalized Defect Prediction

PCC: Training

14

Group Changes by Developer Training

Training Instances

Dev 1

Dev 2

Dev 3

Model 1

Model 2

Model 3

14

Page 45: Personalized Defect Prediction

PCC: Prediction

15

Model 1

Model 2

Model 3

15

Page 46: Personalized Defect Prediction

PCC: Prediction

15

Choose a Model by Developer

(Dev 2)

Model 1

Model 2

Model 3

15

Page 47: Personalized Defect Prediction

PCC: Prediction

15

Choose a Model by Developer Prediction

(Dev 2)

Model 1

Model 2

Model 3

15

Page 48: Personalized Defect Prediction

PCC+: Prediction

16

16

Page 49: Personalized Defect Prediction

PCC+: Prediction

16

Prediction

CC

PCC

Com

biner

Feed Changes to All Models

16

Page 50: Personalized Defect Prediction

Confidence Measure

17

17

Page 51: Personalized Defect Prediction

Confidence Measure

• Bugginess

✦ Probability of a change being buggy

17

17

Page 52: Personalized Defect Prediction

Confidence Measure

• Bugginess

✦ Probability of a change being buggy

• Confidence Measure

✦ Comparable measure of confidence

17

17

Page 53: Personalized Defect Prediction

Confidence Measure

• Bugginess

✦ Probability of a change being buggy

• Confidence Measure

✦ Comparable measure of confidence

• Select the prediction with the highest confidence.

17

17

Page 54: Personalized Defect Prediction

Research Questions

18

18

Page 55: Personalized Defect Prediction

Research Questions

• RQ1: Do PCC and PCC+ outperform CC?

18

18

Page 56: Personalized Defect Prediction

Research Questions

• RQ1: Do PCC and PCC+ outperform CC?

• RQ2: Does PCC outperform CC in other setups?

✦ Classification algorithms

✦ Sizes of training sets

18

18

Page 57: Personalized Defect Prediction

Two Metrics

19

19

Page 58: Personalized Defect Prediction

Two Metrics

• F1-Score

✦ Harmonic mean of precision and recall

19

19

Page 59: Personalized Defect Prediction

Two Metrics

• F1-Score

✦ Harmonic mean of precision and recall

• Cost Effectiveness

✦ Relevant in cost sensitive scenarios

✦ NofB20: Number of Bugs discovered by inspecting top 20% lines of code

19

19

Page 60: Personalized Defect Prediction

Cost EffectivenessCumulative LOC Changes LOC

10% Buggy #1 10

15% Buggy #2 5

19% Buggy #3 4

27% Buggy #4 8

Buggy #5 12

... ...

100

20

20

Page 61: Personalized Defect Prediction

Cost EffectivenessCumulative LOC Changes LOC

10% Buggy #1 10

15% Buggy #2 5

19% Buggy #3 4

27% Buggy #4 8

Buggy #5 12

... ...

100

20

20

Page 62: Personalized Defect Prediction

Cost EffectivenessCumulative LOC Changes LOC

10% Buggy #1 10

15% Buggy #2 5

19% Buggy #3 4

27% Buggy #4 8

Buggy #5 12

... ...

100

20

20

Page 63: Personalized Defect Prediction

Cost EffectivenessCumulative LOC Changes LOC

10% Buggy #1 10

15% Buggy #2 5

19% Buggy #3 4

27% Buggy #4 8

Buggy #5 12

... ...

100

20

20

Page 64: Personalized Defect Prediction

Cost EffectivenessCumulative LOC Changes LOC

10% Buggy #1 10

15% Buggy #2 5

19% Buggy #3 4

27% Buggy #4 8

Buggy #5 12

... ...

100

20

20

Page 65: Personalized Defect Prediction

Cost EffectivenessCumulative LOC Changes LOC

10% Buggy #1 10

15% Buggy #2 5

19% Buggy #3 4

27% Buggy #4 8

Buggy #5 12

... ...

100

20

20

Page 66: Personalized Defect Prediction

Cost EffectivenessCumulative LOC Changes LOC

10% Buggy #1 10

15% Buggy #2 5

19% Buggy #3 4

27% Buggy #4 8

Buggy #5 12

... ...

100

21

21

Page 67: Personalized Defect Prediction

Cost EffectivenessCumulative LOC Changes LOC

10% Buggy #1 10

15% Buggy #2 5

19% Buggy #3 4

27% Buggy #4 8

Buggy #5 12

... ...

100

21

True Bug

True Bug

True Bug

NofB20=3

21

Page 68: Personalized Defect Prediction

Test Subjects

22

Projects Language LOC # of Changes

Linux kernel C 7.3M 429K

PostgreSQL C 289K 89K

Xorg C 1.1M 46K

Eclipse Java 1.5M 73K

Lucene* Java 828K 76K

Jackrabbit* Java 589K 61K

* With manually labelled bug report data [Herzif et al. ’13]

22

Page 69: Personalized Defect Prediction

PCC/PCC+ vs. CC

23

Decision Tree, NofB20

23

Page 70: Personalized Defect Prediction

PCC/PCC+ vs. CC

23

Projects CC PCC Delta PCC+ Delta

Linux 160 179 +19 172 +12PostgreSQL 55 210 +155 175 +120

Xorg 96 159 +63 161 +65Eclipse 116 207 +91 200 +84Lucene 177 254 +77 257 +80

Jackrabbit 411 449 +38 459 +48Average - - +74 - +68

Statistical significant deltas are in bold.

Decision Tree, NofB20

23

Page 71: Personalized Defect Prediction

PCC/PCC+ outperforms CC.

24

24

Page 72: Personalized Defect Prediction

25

ProjectsNaive BayesNaive BayesNaive Bayes Logistic RegressionLogistic RegressionLogistic Regression

ProjectsCC PCC Delta CC PCC Delta

Linux 138 147 +9 102 137 +35

PostgreSQL 89 113 +24 46 56 +10

Xorg 84 101 +17 52 29 -23

Eclipse 65 108 +43 54 55 +1

Lucene 152 139 -13 30 200 +170

Jackrabbit 420 414 -6 261 370 +109

Average - - +12 - - +59Statistical significant deltas are in bold.

NofB20Different Classification Alg.

25

Page 73: Personalized Defect Prediction

25

ProjectsNaive BayesNaive BayesNaive Bayes Logistic RegressionLogistic RegressionLogistic Regression

ProjectsCC PCC Delta CC PCC Delta

Linux 138 147 +9 102 137 +35

PostgreSQL 89 113 +24 46 56 +10

Xorg 84 101 +17 52 29 -23

Eclipse 65 108 +43 54 55 +1

Lucene 152 139 -13 30 200 +170

Jackrabbit 420 414 -6 261 370 +109

Average - - +12 - - +59Statistical significant deltas are in bold.

NofB20Different Classification Alg.

25

Page 74: Personalized Defect Prediction

Different Training Set Sizes

26

PCC CC

100

150

200

250

300

10 20 30 40 50 60 70 80 90

Nof

B20

Training Set Size Per Developer

26

Page 75: Personalized Defect Prediction

Different Training Set Sizes

26

PCC CC

100

150

200

250

300

10 20 30 40 50 60 70 80 90

Nof

B20

Training Set Size Per Developer

26

Page 76: Personalized Defect Prediction

The improvement presents in other setups.

27

27

Page 77: Personalized Defect Prediction

Related Work

• Kim et al., Classifying software changes: Clean or buggy?, TSE ’08

• Bettenburg et al., Think locally, act globally: Improving defect and effort prediction models, MSR ’12

28

28

Page 78: Personalized Defect Prediction

Conclusions & Future Work

• PCC and PCC+ improve prediction performance.

• The improvement presents in other setups.

• Personalized approach can be applied to other fields.

✦ Recommendation systems

✦ Vulnerability prediction

✦ Top crashes prediction

29

29