cross-project build co-change prediction

Cross-Project Build Co-change Prediction

Shane McIntosh

Ahmed E. Hassan

[email protected]@shane_mcintoshshanemcintosh.org

Emad Shihab

David Lo

Xin Xia

What is a build system?

Source code

2

What is a build system?

Source code

Deliverable

2

.tex

.c

.cc

.o

.o

.dvi

.a

.exe

.pdf

.deb

Build systems describe how sources aretranslated into deliverables

3

The build system is at theheart of techniques like

Continuous Integration (CI)

4

.c .mk



Commit

4

Commit 9719cf0

.c .mk



Commit

4

BuildCommit 9719cf0

.c .mk



Commit

4

Build

Test

Commit 9719cf0

.c .mk



Commit

4

Build

Test

ReportCommit 9719cf0 was successfully integrated

Commit 9719cf0

.c .mk

“...nothing can be said to be certain, except death and taxes” - Benjamin Franklin

The Build “Tax”

An Empirical Study of Build Maintenance Effort

S. McIntosh, B. Adams, T. H. D. Nguyen, Y. Kamei, A. E. Hassan

[ICSE 2011]

Up to 27% of source changes require build

changes, too!

5

Neglected build maintenanceis a frequent cause of

build breakage

6

.c .mk


build breakage

Commit

6

Commit aedd38

.c

.mk


build breakage

Commit

6

BuildCommit aedd38

.c

.mk


build breakage

Commit

6

Build

Test

Commit aedd38

.c

.mk


build breakage

Commit

6

Build

Test

Report

Commit aedd38

.c

.mk

Commit aedd38 broke the build!

Neglected build maintenancecan even impact end users

7


7

Not working due to linking of

incorrect SQLite library version


7

Not working due to linking of

incorrect SQLite library version

When are buildchanges necessary?

8

Overview of the studied systems

8


29 years of

historical data

8


29 years of

historical dataProprietary and open source systems

Grouping related changes according to the work items that they address

9


.c .c .cChanges .mk

9

Missed codein #2121

Add feature#2121

Fix forbug #1234


.c .c .c

Transactions

Changes .mk

9

2121

Missed codein #2121

Add feature#2121

1234

Fix forbug #1234


.c .c .c

Transactions

Work items

Changes .mk

9

1 2

.mk

10

We train classifiers to identify code changes that require build co-changes

Workitems

.c.c .c

Classification model

Build change necessary

No build change necessary

1 2

.mk

10


Workitems

.c

.c .cClassification model

Build change necessary


1 2

.mk

11

Workitems

.c

Build changenecessary




12

Prior work shows that within-project build co-change prediction can be accurate

Mining Co-Change Information to Understand when Build Changes

are NecessaryS. McIntosh, B. Adams, M. Nagappan, A. E. Hassan

[ICSME 2014]

Build co-change classifiers can achieve an AUC of 0.60-0.88

However, a large amount of historical data was used to train the classifiers

13


13

What about new

projects?


13

What about new

projects?

…or projects with poorly-recorded historical data?


13

What about new

projects?

…or projects with poorly-recorded historical data?

Can we leverage these largecorpora for the small ones?

14

How well do build co-change prediction models perform on sparse data?

Precision

Recall

F1-score

AUC

0 0.25 0.5 0.75 1

5%50%90%

14


Precision

Recall

F1-score

AUC

0 0.25 0.5 0.75 1

5%50%90%

Challenge 1:Very small datasets tend

to yield models that under-perform

14


Precision

Recall

F1-score

AUC

0 0.25 0.5 0.75 1

5%50%90%

How well do build co-change prediction models

perform on other datasets?

Precision

Recall

F1-score

AUC

0 0.25 0.5 0.75 1

Eclipse => MozillaJazz => MozillaLucene => Mozilla



14


Precision

Recall

F1-score

AUC

0 0.25 0.5 0.75 1

5%50%90%

How well do build co-change prediction models

perform on other datasets?

Precision

Recall

F1-score

AUC

0 0.25 0.5 0.75 1

Eclipse => MozillaJazz => MozillaLucene => Mozilla



Challenge 2:Cross-project build co-change models tend

to under-perform

15

Domain-specific project characteristics may limit the applicability of cross-project models

Training corpus

Testing corpus

Training corpus

16


Testing corpus


Training corpus

16


Testing corpus

?


17

Using transfer learning to provide some domain knowledge to the training corpus

Training corpus

Testing corpus

Move some training data from target

system to the training corpus

17


Training corpus

Testing corpus

18

Training corpus

Testing corpus


19

Training corpus

Testing corpus



19

Training corpus

Testing corpus


?


20

Challenge 3:Build co-changes are the minority

20

Challenge 3:Build co-changes are the minority

Only 8%-17% of changes are build co-changing

21

Training corpus

Testing corpus

Use training corpus to find an appropriate threshold

22

Training corpus

Testing corpus



Set aside the testing corpus

22

Training corpus

Testing corpus



23

Training corpus



Training corpus

Incorrectly classified!

23

Training corpus



Training corpus

24


Training corpus


24


Training corpus

Classification model 1

25


Training corpus



2

26




2…

Classification model N

Ensemble of models used on

the testing corpus

26




2…

Classification model N

27

Evaluating our approach

Relativeperformance

27


Relativeperformance

Training configurationsensitivity

Sour

ceTa

rget

28


Relativeperformance

Sour

ceTa

rget


29

Our approach outperforms baselinecross-project approaches

Eclipse

Jazz

Lucene

Mozilla

Average

0 0.25 0.5 0.75 1

Our approach Ordinary cross-project AdaBoost TrAdaBoost

Wor

st m

easu

red

F-sc

ore

29

Our approach outperforms baselinecross-project approaches

Eclipse

Jazz

Lucene

Mozilla

Average

0 0.25 0.5 0.75 1

Our approach Ordinary cross-project AdaBoost TrAdaBoost

Wor

st m

easu

red

F-sc

ore

37%-42% improvement

30

Our approach achieves similar results to within-project models

Eclipse

Jazz

Lucene

Mozilla

Average

0 0.25 0.5 0.75 1

Our approach Within-project

Wor

st m

easu

red

F-sc

ore

30

Our approach achieves similar results to within-project models

Eclipse

Jazz

Lucene

Mozilla

Average

0 0.25 0.5 0.75 1

Our approach Within-project

Only a 7% drop in performance

Wor

st m

easu

red

F-sc

ore

31


Relativeperformance

Sour

ceTa

rget


31


Relativeperformance

37%-42% improvement over baseline

Sour

ceTa

rget


31


Relativeperformance


Only 7% dropof within-project

F-measureSo

urce

Targ

et


32


Relativeperformance

Sour

ceTa

rget



F-measure


33

Additional data from the target system slowly improves classifier performance

Sour

ce

Targ

et

319

F-sc

ore

34


Relativeperformance

Sour

ceTa

rget



F-measure


34


Relativeperformance

Sour

ceTa

rget



F-measure


F-score tends to improve as more target system data becomes available

[email protected]

cross-project build co-change prediction

Software

build changes

neglected build maintenanceis

neglected build maintenancecan

source changes

code changes

integratedcommit 9719cf0

mkcommit aedd38

cochanges workitems