collecting and leveraging a benchmark of build system clones to aid in quality assessments
DESCRIPTION
Build systems specify how sources are transformed into deliverables, and hence must be carefully maintained to ensure that deliverables are assembled correctly. Similar to source code, build systems tend to grow in complexity unless specifications are refactored. This paper describes how clone detection can aid in quality assessments that determine if and where build refactoring effort should be applied. We gauge cloning rates in build systems by collecting and analyzing a benchmark comprising 3,872 build systems. Analysis of the benchmark reveals that: (1) build systems tend to have higher cloning rates than other software artifacts, (2) recent build technologies tend to be more prone to cloning, especially of configuration details like API dependencies, than older technologies, and (3) build systems that have fewer clones achieve higher levels of reuse via mechanisms not offered by build technologies. Our findings aided in refactoring a large industrial build system containing 1.1 million lines.TRANSCRIPT
Collecting and Leveraging a Benchmark of Build System Clones
to Aid in Quality Assessments
Shane McIntosh
Collecting and Leveraging a Benchmark of Build System Clones
to Aid in Quality Assessments
Shane McIntosh
Martin Poehlmann
Elmar Juergens
Collecting and Leveraging a Benchmark of Build System Clones
to Aid in Quality Assessments
Shane McIntosh
Martin Poehlmann
Elmar Juergens
Audris Mockus
Collecting and Leveraging a Benchmark of Build System Clones
to Aid in Quality Assessments
Shane McIntosh
Bram Adams
Ahmed E. Hassan
Martin Poehlmann
Elmar Juergens
Audris Mockus
Collecting and Leveraging a Benchmark of Build System Clones
to Aid in Quality Assessments
Shane McIntosh
Bram Adams
Ahmed E. Hassan
Martin Poehlmann
Elmar Juergens
Audris Mockus
Brigitte Haupt
Christian Wagner
What is a build system?
Source code
What is a build system?
Source code
Deliverable
.tex
.c
.cc
.o
.o
.dvi
.a
.exe
.deb
Build systems describe how sources are translated into deliverables
3
4
Step 1 - Configuration
Step 2 - Construction
Step 3 - Certification
Step 4 - Packaging
Step 5 - Deployment
5
Step 1 - Configuration
6
Step 2 - Construction
7
Step 3 - Certification
8
Step 4 - Packaging
9
Step 5 - Deployment
Continuous Integration:!Enabled by the
build system
10
Continuous Integration:!Enabled by the
build system
10
Commit
Commit 9719cf0
Continuous Integration:!Enabled by the
build system
10
Commit
BuildCommit 9719cf0
Continuous Integration:!Enabled by the
build system
10
Commit
Build
Test
Commit 9719cf0
Continuous Integration:!Enabled by the
build system
10
Commit
Build
Test
ReportCommit 9719cf0 broke the build!
Commit 9719cf0
Continuous Integration:!Enabled by the
build system
10
Commit
Build
Test
ReportCommit 9719cf0 broke the build!
Commit 9719cf0
“...nothing can be said to be certain, except death and taxes” - Benjamin Franklin
The Build “Tax”
An Empirical Study of Build Maintenance Effort!
S. McIntosh, B. Adams, T. H. D. Nguyen, Y. Kamei, A. E. Hassan
[ICSE 2011]
Up to 27% of source changes require build
changes, too!
11
“...nothing can be said to be certain, except death and taxes” - Benjamin Franklin
The Build “Tax”
An Empirical Study of Build Maintenance Effort!
S. McIntosh, B. Adams, T. H. D. Nguyen, Y. Kamei, A. E. Hassan
[ICSE 2011]
Up to 27% of source changes require build
changes, too!
How do practitioners cope with build maintenance?
11
Cloning!
Useful build logic
12
Useful build logic
Cloning!
Useful build logic
12
Copy&
paste Useful build logic
13
Excessive cloning makes build maintenance painful
!
!
!
!
!
13
Excessive cloning makes build maintenance painful
!
!
!
!
!
13
Excessive cloning makes build maintenance painful
!
!
!
!
!
30 custom business
applications
13
Excessive cloning makes build maintenance painful
!
!
!
!
!
30 custom business
applications
One monolithic build system
13
Excessive cloning makes build maintenance painful
!
!
!
!
!
30 custom business
applications
One monolithic build system
1.1 million lines
of build logic
13
Excessive cloning makes build maintenance painful
!
!
!
!
!
30 custom business
applications
One monolithic build system
1.1 million lines
of build logic
Clone coverage of 94%-99%
13
Excessive cloning makes build maintenance painful
!
!
!
!
!
30 custom business
applications
One monolithic build system
1.1 million lines
of build logic
Clone coverage of 94%-99%
Inflation due to cloning of
10x-23x
13
Excessive cloning makes build maintenance painful
!
!
!
!
!
30 custom business
applications
One monolithic build system
1.1 million lines
of build logic
Clone coverage of 94%-99%
Inflation due to cloning of
10x-23x
Build changes manually duplicated 30 times13
14
14
How much cloning is typical?
14
How much cloning is typical?
Measured
14
How much cloning is typical?
Measured Typical
14
How much cloning is typical?
What type of logic is being
cloned?
Configuration details
Packaging specifications
Measured Typical
14
How much cloning is typical?
What type of logic is being
cloned?
Configuration details
Packaging specifications
What can be done to mitigate
cloning?
Measured Typical
14
How much cloning is typical?
What type of logic is being
cloned?
Configuration details
Packaging specifications
What can be done to mitigate
cloning?
Measured Typical
1515
15
Large collection of open source
15
15
Large collection of open source3,872 projects
15
15
Large collection of open source3,872 projects
1,275 Java projects
Ant Maven
15
15
Large collection of open source3,872 projects
2,597 C/C++ projects
Autotools CMake
1,275 Java projects
Ant Maven
15
16
How much cloning is typical?
What type of logic is being
cloned?
Configuration details
Packaging specifications
What can be done to mitigate
cloning?
Measured Typical
Apache Github GNU Sourceforge
●
●●
●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●●●●●
●
●●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
0.00
0.25
0.50
0.75
1.00
Ant Maven Ant Maven Ant Ant Maven
Clo
ne C
over
age
MinimumLength
5101520
17
Lots of small clones in Java build systems
Apache Github GNU Sourceforge
●
●●
●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●●●●●
●
●●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
0.00
0.25
0.50
0.75
1.00
Ant Maven Ant Maven Ant Ant Maven
Clo
ne C
over
age
MinimumLength
5101520
17
Build logic clones tend to be small
Lots of small clones in Java build systems
Apache Github GNU Sourceforge
●
●●
●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●●●●●
●
●●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
0.00
0.25
0.50
0.75
1.00
Ant Maven Ant Maven Ant Ant Maven
Clo
ne C
over
age
MinimumLength
5101520
17
Half of Maven systems have clone coverage of at least 47%-50%
Build logic clones tend to be small
Lots of small clones in Java build systems
Apache Github GNU Sourceforge
●
●●
●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●●●●●
●
●●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
0.00
0.25
0.50
0.75
1.00
Ant Maven Ant Maven Ant Ant Maven
Clo
ne C
over
age
MinimumLength
5101520
17
Half of Maven systems have clone coverage of at least 47%-50%
Build logic clones tend to be small1/4 of Ant systems have
clone coverage of >50%
Lots of small clones in Java build systems
18
Apache Github GNU Sourceforge
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
0.0
0.2
0.4
0.6
Autotools CMake Autotools CMake Autotools CMake Autotools CMake
Clo
ne C
over
age
MinimumLength
5101520
Less cloning in C/C++ build systems
18
Apache Github GNU Sourceforge
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
0.0
0.2
0.4
0.6
Autotools CMake Autotools CMake Autotools CMake Autotools CMake
Clo
ne C
over
age
MinimumLength
5101520
Less cloning in C/C++ build systems
18
Apache Github GNU Sourceforge
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
0.0
0.2
0.4
0.6
Autotools CMake Autotools CMake Autotools CMake Autotools CMake
Clo
ne C
over
age
MinimumLength
5101520
Less cloning in C/C++ build systems
Clone coverage rarely exceeds 30% in C/C++ build systems
18
Apache Github GNU Sourceforge
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
0.0
0.2
0.4
0.6
Autotools CMake Autotools CMake Autotools CMake Autotools CMake
Clo
ne C
over
age
MinimumLength
5101520
Less cloning in C/C++ build systems
Clone coverage rarely exceeds 30% in C/C++ build systems
Yet there are still some C/C++ build systems with plenty of clones!27%-66% 21%-48%
19
How much cloning is typical?
What type of logic is being
cloned?
Configuration details
Packaging specifications
What can be done to mitigate
cloning?
Measured Typical
19
How much cloning is typical?
What type of logic is being
cloned?
Configuration details
Packaging specifications
What can be done to mitigate
cloning?
Measured Typical>50% clone coverage
is common
<30% clone coverage
is common
20
How much cloning is typical?
What type of logic is being
cloned?
Configuration details
Packaging specifications
What can be done to mitigate
cloning?
Measured Typical>50% clone coverage
is common
<30% clone coverage
is common
21
Manual analysis of a statistically representative sample of clones
21
Manual analysis of a statistically representative sample of clones
Autotools CMakeAnt Maven Total
21
Manual analysis of a statistically representative sample of clones
Autotools CMakeAnt Maven Total56,521 71,543 23,723 3,746All clones 155,533
21
Manual analysis of a statistically representative sample of clones
Autotools CMakeAnt Maven Total56,521 71,543 23,723 3,746All clones 155,533
Sample!(95%±5%) 382 382 378 349 1,491
21
Manual analysis of a statistically representative sample of clones
Autotools CMakeAnt Maven Total56,521 71,543 23,723 3,746All clones 155,533
Sample!(95%±5%) 382 382 378 349 1,491
Config.Const.Cert.Pkg.Depl.
32% 79% 22% 40%64% 17% 56% 66%12% 4% 13% 11%25% 21% 21% 2%11% 1% 9% 7%
21
Manual analysis of a statistically representative sample of clones
Autotools CMakeAnt Maven Total56,521 71,543 23,723 3,746All clones 155,533
Sample!(95%±5%) 382 382 378 349 1,491
Config.Const.Cert.Pkg.Depl.
32% 79% 22% 40%64% 17% 56% 66%12% 4% 13% 11%25% 21% 21% 2%11% 1% 9% 7%
Cloning shifts from construction to configuration
21
Manual analysis of a statistically representative sample of clones
Autotools CMakeAnt Maven Total56,521 71,543 23,723 3,746All clones 155,533
Sample!(95%±5%) 382 382 378 349 1,491
Config.Const.Cert.Pkg.Depl.
32% 79% 22% 40%64% 17% 56% 66%12% 4% 13% 11%25% 21% 21% 2%11% 1% 9% 7%
Cloning shifts from construction to configuration
Construction is the most heavily cloned C/C++ build phase
21
Manual analysis of a statistically representative sample of clones
Autotools CMakeAnt Maven Total56,521 71,543 23,723 3,746All clones 155,533
Sample!(95%±5%) 382 382 378 349 1,491
Config.Const.Cert.Pkg.Depl.
32% 79% 22% 40%64% 17% 56% 66%12% 4% 13% 11%25% 21% 21% 2%11% 1% 9% 7%
Cloning shifts from construction to configuration
Construction is the most heavily cloned C/C++ build phase
Rarely cloned due to CPack
21
22
How much cloning is typical?
What type of logic is being
cloned?
Configuration details
Packaging specifications
What can be done to mitigate
cloning?
Measured Typical>50% clone coverage
is common
<30% clone coverage
is common
22
How much cloning is typical?
What type of logic is being
cloned?
Configuration details
Packaging specifications
What can be done to mitigate
cloning?
Measured Typical Cloning shifts from const. to
config.
Mostly construction
clones
>50% clone coverage
is common
<30% clone coverage
is common
23
How much cloning is typical?
What type of logic is being
cloned?
Configuration details
Packaging specifications
What can be done to mitigate
cloning?
Measured Typical Cloning shifts from const. to
config.
Mostly construction
clones
>50% clone coverage
is common
<30% clone coverage
is common
24
Teams often migrate from one technology to another
Autotools CMakeAnt Maven
24
Teams often migrate from one technology to another
Autotools CMakeAnt MavenCould technology migration help to reduce cloning?
Migration is not a silver bullet
●
●
●
●
●
●
0.390.47
0.52
0.700.75
0.82
0.00 0.00 0.04
0.220.30
0.47
0.15
0.25
0.36
0.680.77
0.84
0.00 0.00 0.00
0.180.26
0.39
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00Proportion of Systems
Clo
ne C
over
age
AbnormalityVery highHighModerately highNormalModerately lowLowVery low
Technology● Ant
AutotoolsCMakeMaven
25
Migration is not a silver bullet
●
●
●
●
●
●
0.390.47
0.52
0.700.75
0.82
0.00 0.00 0.04
0.220.30
0.47
0.15
0.25
0.36
0.680.77
0.84
0.00 0.00 0.00
0.180.26
0.39
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00Proportion of Systems
Clo
ne C
over
age
AbnormalityVery highHighModerately highNormalModerately lowLowVery low
Technology● Ant
AutotoolsCMakeMaven
25
More cloning in
Maven than Ant
Migration is not a silver bullet
●
●
●
●
●
●
0.390.47
0.52
0.700.75
0.82
0.00 0.00 0.04
0.220.30
0.47
0.15
0.25
0.36
0.680.77
0.84
0.00 0.00 0.00
0.180.26
0.39
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00Proportion of Systems
Clo
ne C
over
age
AbnormalityVery highHighModerately highNormalModerately lowLowVery low
Technology● Ant
AutotoolsCMakeMaven
25
More cloning in
Maven than Ant
High thresholds!are very similar
Migration is not a silver bullet
●
●
●
●
●
●
0.390.47
0.52
0.700.75
0.82
0.00 0.00 0.04
0.220.30
0.47
0.15
0.25
0.36
0.680.77
0.84
0.00 0.00 0.00
0.180.26
0.39
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00Proportion of Systems
Clo
ne C
over
age
AbnormalityVery highHighModerately highNormalModerately lowLowVery low
Technology● Ant
AutotoolsCMakeMaven
25
More cloning in
Maven than Ant
High thresholds!are very similar
Q: How are they avoiding build cloning?
26
Using abstraction mechanisms not!provided by the build technologies
Java files as inputs using wildcards. Hence, the compile tar-get is generic, and often cloned in several Ant specifications.
While construction accounts for many of the clones in Antbuild systems (64%), most Maven clones impact configura-tion (79%). The next most frequently cloned phase (pack-aging) appears in less than a third as many clones (21%).
We observed that many of the Maven clones replicatethird-party dependency lists or plugin configuration amongsubsystems. While this ensures that each subsystem can bebuilt independently of the others, it imposes a heavy load onmaintainers, who will need to update several pom.xml filesin order to modify third-party dependency lists or updateplugin configurations.Construction is the most heavily cloned build phasein C/C++ build systems. Table 3 shows that thebuild subcategory represents 48% and 65% of Autotools andCMake clones respectively. The filesystem category is alsodetected in 19% of Autotools and 1% of CMake clones. Allin all, the construction phase accounts for 56% of Autotoolsclones and 66% of CMake clones.
While the Autotools construction phase is most frequentlycloned (56%), Table 3 shows that packaging details are alsocloned often (21%). Yet packaging details are rarely clonedin CMake (2%). Many of these Autotools packaging cloneshave to do with repetition of data file lists among subsys-tems. Since Autotools build specifications generate recursivemake build systems, variables are not shared among subsys-tems [23]. Although Autotools offers developers an include
directive, we have observed that rather than place sharedvariables in a header-like file, developers often clone vari-ables that have a shared scope. Similar to Maven, wheredependency lists and plugin configuration were replicated,developers likely duplicate shared variables to facilitate sub-system independence. However, this makes system-widechanges more difficult.
Conversely, we observe that CMake packaging details arerarely cloned. We observe that developers leverage built-in CPack functionality of CMake [20], where packaging de-tails are typically specified in a single location: CPackCon-
fig.cmake. This eliminates the need to replicate packagingdetails in subsystem specifications.There are more configuration clones in the morerecent build technologies. Table 3 shows that thereare more than twice as many configuration clones in Mavenbuild systems (79%) than Ant ones (32%). Similarly, thereare almost twice as many configuration clones in CMake(40%) than there are in Autotools (22%). In Section 5, wereport that these more recent technologies are more proneto cloning (RQ2). The shift of cloning tendencies towardsconfiguration likely contributes to the inflated cloning valueswe observe in the more recent technologies.
Configuration details are cloned more often in the morerecent CMake and Maven build technologies. For Javabuild systems, Maven clones favour the configurationphase, while Ant clones (and clones in C/C++ builds)favour construction. CMake packaging support (CPack)helps to reduce cloning in the packaging phase.
(RQ5) How do build systems with few clonesachieve low clone rates?
To address RQ5, we analyze clones in the systems with thelowest and highest cloning rates for each studied technology.
<!-- Define references to files containing common targets --><!DOCTYPE project [
<!ENTITY modules -common SYSTEM "../ modules -common.ent">]>
...<project name="bea" default="all">
<!-- Include the file containing common targets. -->&modules -common;
</project >
Listing 1: Using XML entity expansion to importcommon build code in the Keel system.
Much Java build cloning can be avoided by exploit-ing the underlying XML representation. We observethat entire files are duplicated in the Ant and Maven systemswith the highest clone coverage. In these cases, developmentteams duplicate existing build specifications to rapidly de-velop new subsystems. However, developers have referredto maintaining such build systems fraught with clones asa “nightmare” (cf. Section 3). Defect fixes or updates todependency lists, tool configuration, and packaging detailsmust be carefully replicated among the clones to ensure thatbuilds continue to assemble deliverables correctly.On the other hand, we have observed that in addition to
abstraction mechanisms provided by Ant and Maven (e.g.,the include and import tasks), XML-based build systemsavoid cloning by leveraging the underlying XML represen-tation. In prior work, we note that the JBoss build sys-tem leverages XML entity expansion in Ant to implement aframework-driven build system referred to as“buildmagic”[21].Indeed, Listing 1 shows how one can use XML entity expan-sion to avoid duplicating shared build code in subsystembuild specifications. We also find that the Ant test suite in-cludes regression tests to ensure that entity expansion con-tinues to work, suggesting that it is not a workaround, butinstead is intentionally supported functionality.Many C/C++ build logic clones can be avoided byduplicating templates automatically when building.Many of the studied C/C++ systems provide developmentAPIs. As such, they ship examples of how to use vari-ous API functionality with their deliverables. These exam-ples include accompanying build specifications. However,in the C/C++ systems with the highest clone coverage, wefind that many of these example build specifications are fileclones of each other, which poses maintainability problems.One of the studied systems with a low clone coverage
avoids these clones by duplicating and specializing templatebuild specification automatically using shell scripts duringthe construction phase. Using this approach, cloning sharedbuild code in example build specifications can be avoided.
XML entity expansion can be used to avoid cloning sharedbuild code in Java build systems. Cloning of build logicshipped with API usage examples can be avoided by au-tomatically deriving specifications at build-time.
7. THREATS TO VALIDITYWe now discuss the threats to the validity of our analysis.
Construct validity. Our clone detection tool is config-ured to only detect Type I (exact) clones. Since we do notdetect Type II, III, or IV clones, our cloning results shouldbe interpreted as lower bounds rather than exact values.Internal validity. We assume that large values of cloningmetrics suggest maintenance problems illustrated in our Mu-
26
Using abstraction mechanisms not!provided by the build technologies
Java files as inputs using wildcards. Hence, the compile tar-get is generic, and often cloned in several Ant specifications.
While construction accounts for many of the clones in Antbuild systems (64%), most Maven clones impact configura-tion (79%). The next most frequently cloned phase (pack-aging) appears in less than a third as many clones (21%).
We observed that many of the Maven clones replicatethird-party dependency lists or plugin configuration amongsubsystems. While this ensures that each subsystem can bebuilt independently of the others, it imposes a heavy load onmaintainers, who will need to update several pom.xml filesin order to modify third-party dependency lists or updateplugin configurations.Construction is the most heavily cloned build phasein C/C++ build systems. Table 3 shows that thebuild subcategory represents 48% and 65% of Autotools andCMake clones respectively. The filesystem category is alsodetected in 19% of Autotools and 1% of CMake clones. Allin all, the construction phase accounts for 56% of Autotoolsclones and 66% of CMake clones.
While the Autotools construction phase is most frequentlycloned (56%), Table 3 shows that packaging details are alsocloned often (21%). Yet packaging details are rarely clonedin CMake (2%). Many of these Autotools packaging cloneshave to do with repetition of data file lists among subsys-tems. Since Autotools build specifications generate recursivemake build systems, variables are not shared among subsys-tems [23]. Although Autotools offers developers an include
directive, we have observed that rather than place sharedvariables in a header-like file, developers often clone vari-ables that have a shared scope. Similar to Maven, wheredependency lists and plugin configuration were replicated,developers likely duplicate shared variables to facilitate sub-system independence. However, this makes system-widechanges more difficult.
Conversely, we observe that CMake packaging details arerarely cloned. We observe that developers leverage built-in CPack functionality of CMake [20], where packaging de-tails are typically specified in a single location: CPackCon-
fig.cmake. This eliminates the need to replicate packagingdetails in subsystem specifications.There are more configuration clones in the morerecent build technologies. Table 3 shows that thereare more than twice as many configuration clones in Mavenbuild systems (79%) than Ant ones (32%). Similarly, thereare almost twice as many configuration clones in CMake(40%) than there are in Autotools (22%). In Section 5, wereport that these more recent technologies are more proneto cloning (RQ2). The shift of cloning tendencies towardsconfiguration likely contributes to the inflated cloning valueswe observe in the more recent technologies.
Configuration details are cloned more often in the morerecent CMake and Maven build technologies. For Javabuild systems, Maven clones favour the configurationphase, while Ant clones (and clones in C/C++ builds)favour construction. CMake packaging support (CPack)helps to reduce cloning in the packaging phase.
(RQ5) How do build systems with few clonesachieve low clone rates?
To address RQ5, we analyze clones in the systems with thelowest and highest cloning rates for each studied technology.
<!-- Define references to files containing common targets --><!DOCTYPE project [
<!ENTITY modules -common SYSTEM "../ modules -common.ent">]>
...<project name="bea" default="all">
<!-- Include the file containing common targets. -->&modules -common;
</project >
Listing 1: Using XML entity expansion to importcommon build code in the Keel system.
Much Java build cloning can be avoided by exploit-ing the underlying XML representation. We observethat entire files are duplicated in the Ant and Maven systemswith the highest clone coverage. In these cases, developmentteams duplicate existing build specifications to rapidly de-velop new subsystems. However, developers have referredto maintaining such build systems fraught with clones asa “nightmare” (cf. Section 3). Defect fixes or updates todependency lists, tool configuration, and packaging detailsmust be carefully replicated among the clones to ensure thatbuilds continue to assemble deliverables correctly.On the other hand, we have observed that in addition to
abstraction mechanisms provided by Ant and Maven (e.g.,the include and import tasks), XML-based build systemsavoid cloning by leveraging the underlying XML represen-tation. In prior work, we note that the JBoss build sys-tem leverages XML entity expansion in Ant to implement aframework-driven build system referred to as“buildmagic”[21].Indeed, Listing 1 shows how one can use XML entity expan-sion to avoid duplicating shared build code in subsystembuild specifications. We also find that the Ant test suite in-cludes regression tests to ensure that entity expansion con-tinues to work, suggesting that it is not a workaround, butinstead is intentionally supported functionality.Many C/C++ build logic clones can be avoided byduplicating templates automatically when building.Many of the studied C/C++ systems provide developmentAPIs. As such, they ship examples of how to use vari-ous API functionality with their deliverables. These exam-ples include accompanying build specifications. However,in the C/C++ systems with the highest clone coverage, wefind that many of these example build specifications are fileclones of each other, which poses maintainability problems.One of the studied systems with a low clone coverage
avoids these clones by duplicating and specializing templatebuild specification automatically using shell scripts duringthe construction phase. Using this approach, cloning sharedbuild code in example build specifications can be avoided.
XML entity expansion can be used to avoid cloning sharedbuild code in Java build systems. Cloning of build logicshipped with API usage examples can be avoided by au-tomatically deriving specifications at build-time.
7. THREATS TO VALIDITYWe now discuss the threats to the validity of our analysis.
Construct validity. Our clone detection tool is config-ured to only detect Type I (exact) clones. Since we do notdetect Type II, III, or IV clones, our cloning results shouldbe interpreted as lower bounds rather than exact values.Internal validity. We assume that large values of cloningmetrics suggest maintenance problems illustrated in our Mu-
Store an external block of XML in a macro
26
Using abstraction mechanisms not!provided by the build technologies
Java files as inputs using wildcards. Hence, the compile tar-get is generic, and often cloned in several Ant specifications.
While construction accounts for many of the clones in Antbuild systems (64%), most Maven clones impact configura-tion (79%). The next most frequently cloned phase (pack-aging) appears in less than a third as many clones (21%).
We observed that many of the Maven clones replicatethird-party dependency lists or plugin configuration amongsubsystems. While this ensures that each subsystem can bebuilt independently of the others, it imposes a heavy load onmaintainers, who will need to update several pom.xml filesin order to modify third-party dependency lists or updateplugin configurations.Construction is the most heavily cloned build phasein C/C++ build systems. Table 3 shows that thebuild subcategory represents 48% and 65% of Autotools andCMake clones respectively. The filesystem category is alsodetected in 19% of Autotools and 1% of CMake clones. Allin all, the construction phase accounts for 56% of Autotoolsclones and 66% of CMake clones.
While the Autotools construction phase is most frequentlycloned (56%), Table 3 shows that packaging details are alsocloned often (21%). Yet packaging details are rarely clonedin CMake (2%). Many of these Autotools packaging cloneshave to do with repetition of data file lists among subsys-tems. Since Autotools build specifications generate recursivemake build systems, variables are not shared among subsys-tems [23]. Although Autotools offers developers an include
directive, we have observed that rather than place sharedvariables in a header-like file, developers often clone vari-ables that have a shared scope. Similar to Maven, wheredependency lists and plugin configuration were replicated,developers likely duplicate shared variables to facilitate sub-system independence. However, this makes system-widechanges more difficult.
Conversely, we observe that CMake packaging details arerarely cloned. We observe that developers leverage built-in CPack functionality of CMake [20], where packaging de-tails are typically specified in a single location: CPackCon-
fig.cmake. This eliminates the need to replicate packagingdetails in subsystem specifications.There are more configuration clones in the morerecent build technologies. Table 3 shows that thereare more than twice as many configuration clones in Mavenbuild systems (79%) than Ant ones (32%). Similarly, thereare almost twice as many configuration clones in CMake(40%) than there are in Autotools (22%). In Section 5, wereport that these more recent technologies are more proneto cloning (RQ2). The shift of cloning tendencies towardsconfiguration likely contributes to the inflated cloning valueswe observe in the more recent technologies.
Configuration details are cloned more often in the morerecent CMake and Maven build technologies. For Javabuild systems, Maven clones favour the configurationphase, while Ant clones (and clones in C/C++ builds)favour construction. CMake packaging support (CPack)helps to reduce cloning in the packaging phase.
(RQ5) How do build systems with few clonesachieve low clone rates?
To address RQ5, we analyze clones in the systems with thelowest and highest cloning rates for each studied technology.
<!-- Define references to files containing common targets --><!DOCTYPE project [
<!ENTITY modules -common SYSTEM "../ modules -common.ent">]>
...<project name="bea" default="all">
<!-- Include the file containing common targets. -->&modules -common;
</project >
Listing 1: Using XML entity expansion to importcommon build code in the Keel system.
Much Java build cloning can be avoided by exploit-ing the underlying XML representation. We observethat entire files are duplicated in the Ant and Maven systemswith the highest clone coverage. In these cases, developmentteams duplicate existing build specifications to rapidly de-velop new subsystems. However, developers have referredto maintaining such build systems fraught with clones asa “nightmare” (cf. Section 3). Defect fixes or updates todependency lists, tool configuration, and packaging detailsmust be carefully replicated among the clones to ensure thatbuilds continue to assemble deliverables correctly.On the other hand, we have observed that in addition to
abstraction mechanisms provided by Ant and Maven (e.g.,the include and import tasks), XML-based build systemsavoid cloning by leveraging the underlying XML represen-tation. In prior work, we note that the JBoss build sys-tem leverages XML entity expansion in Ant to implement aframework-driven build system referred to as“buildmagic”[21].Indeed, Listing 1 shows how one can use XML entity expan-sion to avoid duplicating shared build code in subsystembuild specifications. We also find that the Ant test suite in-cludes regression tests to ensure that entity expansion con-tinues to work, suggesting that it is not a workaround, butinstead is intentionally supported functionality.Many C/C++ build logic clones can be avoided byduplicating templates automatically when building.Many of the studied C/C++ systems provide developmentAPIs. As such, they ship examples of how to use vari-ous API functionality with their deliverables. These exam-ples include accompanying build specifications. However,in the C/C++ systems with the highest clone coverage, wefind that many of these example build specifications are fileclones of each other, which poses maintainability problems.One of the studied systems with a low clone coverage
avoids these clones by duplicating and specializing templatebuild specification automatically using shell scripts duringthe construction phase. Using this approach, cloning sharedbuild code in example build specifications can be avoided.
XML entity expansion can be used to avoid cloning sharedbuild code in Java build systems. Cloning of build logicshipped with API usage examples can be avoided by au-tomatically deriving specifications at build-time.
7. THREATS TO VALIDITYWe now discuss the threats to the validity of our analysis.
Construct validity. Our clone detection tool is config-ured to only detect Type I (exact) clones. Since we do notdetect Type II, III, or IV clones, our cloning results shouldbe interpreted as lower bounds rather than exact values.Internal validity. We assume that large values of cloningmetrics suggest maintenance problems illustrated in our Mu-
Store an external block of XML in a macro
Expand the macro
27
How much cloning is typical?
What type of logic is being
cloned?
Configuration details
Packaging specifications
What can be done to mitigate
cloning?
Measured Typical Cloning shifts from const. to
config.
Mostly construction
clones
>50% clone coverage
is common
<30% clone coverage
is common
27
How much cloning is typical?
What type of logic is being
cloned?
Configuration details
Packaging specifications
What can be done to mitigate
cloning?
Measured Typical Cloning shifts from const. to
config.
Mostly construction
clones
Ant Maven may reduce
cloning
Use of “creative”
abstraction
>50% clone coverage
is common
<30% clone coverage
is common