an empirical study on the adequacy of testing in open source projects
TRANSCRIPT
An Empirical Study on the Adequacy of Testing inOpen Source Projects
Pavneet S. Kochhar1, Ferdian Thung1, David Lo1, and Julia Lawall2
1Singapore Management University2Inria/Lip6 France
{kochharps.2012,ferdiant.2013,davidlo}@smu.edu.sg, [email protected]
Asia-Pacific Software Engineering Conference (APSEC’14)
2
Open-Source Software, Why Bother?
• Plethora of open source software used by many commercial applications
• Large organizations investing time, effort and money in open source development
3
Software Testing, Why Bother?
Functionality -- Requirements
Bugs -- Software reliability
Costs -- Late bugs cost more
4
Software Testing, Why Bother?
• Horgan and Mathur [1]– Adequate testing is critical to develop reliable
software• Tassey [2]
– Inadequate testing cost US economy 59 billion dollars annually
[1] J.R. Horgan and A.P. Mathur, “Software testing and reliability.” McGraw-Hill, Inc., 1996.[2] G. Tassey, “The economic impacts of inadequate infrastructure for software testing,” National Institute of Standards and Technology, 2002.
5
Study Goals
• Understand the state-of-the-practice of testing among open source projects
• Make recommendations to improve the state-of-practice
Are open-source projects adequately tested?
6
Understanding State-of-Practice
• Study a large number of projects• Check adequacy of testing
– Execute test cases – Assess test adequacy
• Characterize cases of inadequate testing– Correlate project metrics with test adequacy– At various levels of granularity
7
Outline
• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work
8
Test Adequacy
• Test Adequacy Criterion– Property that must be satisfied for a test suite
to be thorough. – Often measured by code coverage.
• Code Coverage– Percentage of the code executed by test cases
• Line coverage• Branch coverage
Test Adequacy
9
CT = number of branches that evaluate to trueCF = number of branches that evaluate to falseB = total number of branchesLC = total number of lines that are executedEL = total number of lines that are executable
10
Why Code Coverage?• Mockus et al. [1]
– Higher coverage leads to low post-release defects.
• Berner et al. [2] – Judicious use of coverage helps in finding new
defects.• Shamasunder [3]
– Branch & block coverage have correlation with fault detection.
[1] A. Mockus, N. Nagappan, and T. T. Dinh-Trong, “Test coverage and post-verification defects: A multiple case study,” in ESEM, 2009.[2] S. Berner, R. Weber, and R. K. Keller, “Enhancing software testing by judicious use of code coverage information,” in ICSE, 2007.[3] S. Shamasunder, “Empirical study - pairwise prediction of fault based on coverage,” Master’s thesis, 2012.
11
Source Code Metrics
• Number of lines of code (LOC)• Cyclomatic complexity (CC)
– Number of linearly independent paths through the source code
• Number of developers
12
Tool Support
• Computes the source code metrics• Runs test cases• Compute the overall coverage• Relies on the maven directory structure
13
Outline
• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work
14
Data Collection
• The largest site for open source project development– >3,000,000 users & 5,000,000 repositories
• One of the most popular Linux distributions
15
Data Collection• Find projects that use Maven
– Needed to run Sonar
757 projects 228 projects
945 projects(After removing duplicates)
16
Data Collection
• mvn clean install – Compiles the project• mvn sonar:sonar – Runs test cases and get statistics
945 projects
872 projectscontain test suites
327 projectsSuccessfully compile, run test
cases & produce coverage
17
Data Collection
Number of Lines of Code
Number of Test Cases
18
Data Collection
Cyclomatic Complexity
Number of Developers
19
Outline
• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work
20
Research Questions
RQ1: What are the coverage levels and test success densities exhibited by different projects? RQ2: What are the correlations between various software metrics and code coverage at the project level?
RQ3: What are the correlations between various software metrics and code coverage at the source code file level?
21
Research Questions
RQ1:Coverage Levels & Test Success Densities
22
RQ1: Coverage
Coverage Level (%) Number of Projects0-25 10525-50 9050-75 92
75-100 40
• 40 projects have coverage between 75%-100% • Average Coverage – 41.96%• Median Coverage – 40.30%
Coverage Level Distribution
23
RQ1: Success Density
• 254 projects have test success density >= 98%
Test Success Density• Passing Tests / Total
tests
24
Research Questions
RQ2:Metrics vs. Coverage at Project Level
25
RQ2: Metrics vs. Coverage (Project)Lines of Code vs. Coverage
• Spearman’s rho = -0.306 (Negative Correlation)• p-value = 1.566e-08
26
RQ2: Metrics vs. Coverage (Project)
• Spearman’s rho = -0.276 (Negative Correlation)• p-value = 3.665e-07
Cyclomatic Complexity vs. Coverage
27
RQ2: Metrics vs. Coverage (Project)
• Spearman’s rho = 0.016 (Insignificant Correlation)• p-value = 0.763
Number of Developers vs. Coverage
28
Research Questions
RQ3:Metrics vs. Coverage at File Level
29
RQ3: Metrics vs. Coverage (File)
• Spearman’s rho = 0.180 (Small +ve Correlation)• p-value < 2.2e-16
Lines of Code vs. Coverage
30
RQ3: Metrics vs. Coverage (File)
• Spearman’s rho = 0.221 (Small +ve Correlation)• p-value < 2.2e-16
Cyclomatic Complexity vs. Coverage
31
RQ3: Metrics vs. Coverage (File)
• Spearman’s rho = 0.050 (No Correlation)• p-value < 2.2e-16
Number of Developers vs. Coverage
32
Outline
• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work
33
Recommendations• Practitioners:
‒ Need to improve testing efforts, especially for large or complex software projects
‒ Need to look into automated test case generation tools
• Researchers:‒ Need to promote new tools that can be easily
used by developers‒ Need to develop test case generation tools
that can scale to large projects
34
Threats to Validity
• Internal validity:– Sonar might produce incorrect metrics or
coverage values• Projects do not conform to Maven directory
structure– We have performed some manual checks
• External validity:– Only analyze 300+ projects from GitHub and
Debian
35
Threats to Validity
• Construct validity:– Make use of standard adequacy criterion
• Code coverage– Make use of standard code metrics
• Lines of code (LOC)• Cyclomatic complexity (CC)
– Little threats to construct validity
36
Related Work• Empirical study on testing and coverage
– Mockus et al. study the impact of coverage on number of post-release defects [1]
– Shamasunder analyze the impact of different kinds of coverage on fault detection [2]
– Gopinath et al. investigate the correlation between coverage and a test suite’s effectiveness in killing mutants [3]
[1] A. Mockus, N. Nagappan, and T. T. Dinh-Trong, “Test coverage and post-verification defects: A multiple case study”, in ESEM, 2009.[2] S. Shamasunder, “Empirical study - pairwise prediction of fault based on coverage”, Master’s thesis, 2012.[3] R Gopinath, C. Jensen, and A. Groce, “Code coverage for suite evaluation for developers”, ICSE, 2014.
37
Related Work• Test case generation techniques
– Thummalapenta et al. automatically generates a series of method invocations to produce a target object state [1]
– Pandita et al. produce test inputs to achieve logical and boundary-value coverage [2]
– Park et al. combines random testing with static program analysis and concolic execution [3]
[1] S, Thummalapenta et al., “Synthesizing method sequences for high-coverage testing”, in OOPSLA, 2011.[2] R. Pandita et al., “Guided test generation for coverage criteria”, ICSM, 2010.[3] S. Park et al., “Carfast: Achieving higher statement coverage faster”, FSE, 2012.
Conclusion
38
• Many open-source projects are poorly tested‒ Only 40/327 projects have high coverage‒ Average coverage: 41.96%
• Coverage is poorer when projects get larger and more complex.
• Coverage is better for larger and more complex source code files.
• Number of developers are not significantly correlated with coverage.
39
Future Work
• Expand the study to include more projects– Address the threats to external validity
• Investigate other software metrics – Common cases of poor coverage
• Investigate the amount of effort required to attain a particular level of coverage– Cost-effectiveness analysis: effort vs. benefit
Thank you!
Questions? Comments? Advice?{kochharps.2012,ferdiant.2013}@[email protected]@lip6.fr