an empirical study on the adequacy of testing in open source projects

An Empirical Study on the Adequacy of Testing inOpen Source Projects

Pavneet S. Kochhar1, Ferdian Thung1, David Lo1, and Julia Lawall2

1Singapore Management University2Inria/Lip6 France

{kochharps.2012,ferdiant.2013,davidlo}@smu.edu.sg, [email protected]

Asia-Pacific Software Engineering Conference (APSEC’14)

2

Open-Source Software, Why Bother?

• Plethora of open source software used by many commercial applications

• Large organizations investing time, effort and money in open source development

3

Software Testing, Why Bother?

Functionality -- Requirements

Bugs -- Software reliability

Costs -- Late bugs cost more

4

Software Testing, Why Bother?

• Horgan and Mathur [1]– Adequate testing is critical to develop reliable

software• Tassey [2]

– Inadequate testing cost US economy 59 billion dollars annually

[1] J.R. Horgan and A.P. Mathur, “Software testing and reliability.” McGraw-Hill, Inc., 1996.[2] G. Tassey, “The economic impacts of inadequate infrastructure for software testing,” National Institute of Standards and Technology, 2002.

5

Study Goals

• Understand the state-of-the-practice of testing among open source projects

• Make recommendations to improve the state-of-practice

Are open-source projects adequately tested?

6

Understanding State-of-Practice

• Study a large number of projects• Check adequacy of testing

– Execute test cases – Assess test adequacy

• Characterize cases of inadequate testing– Correlate project metrics with test adequacy– At various levels of granularity

7

Outline

• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work

8

Test Adequacy

• Test Adequacy Criterion– Property that must be satisfied for a test suite

to be thorough. – Often measured by code coverage.

• Code Coverage– Percentage of the code executed by test cases

• Line coverage• Branch coverage

Test Adequacy

9

CT = number of branches that evaluate to trueCF = number of branches that evaluate to falseB = total number of branchesLC = total number of lines that are executedEL = total number of lines that are executable

10

Why Code Coverage?• Mockus et al. [1]

– Higher coverage leads to low post-release defects.

• Berner et al. [2] – Judicious use of coverage helps in finding new

defects.• Shamasunder [3]

– Branch & block coverage have correlation with fault detection.

[1] A. Mockus, N. Nagappan, and T. T. Dinh-Trong, “Test coverage and post-verification defects: A multiple case study,” in ESEM, 2009.[2] S. Berner, R. Weber, and R. K. Keller, “Enhancing software testing by judicious use of code coverage information,” in ICSE, 2007.[3] S. Shamasunder, “Empirical study - pairwise prediction of fault based on coverage,” Master’s thesis, 2012.

11

Source Code Metrics

• Number of lines of code (LOC)• Cyclomatic complexity (CC)

– Number of linearly independent paths through the source code

• Number of developers

12

Tool Support

• Computes the source code metrics• Runs test cases• Compute the overall coverage• Relies on the maven directory structure

13

Outline


14

Data Collection

• The largest site for open source project development– >3,000,000 users & 5,000,000 repositories

• One of the most popular Linux distributions

15

Data Collection• Find projects that use Maven

– Needed to run Sonar

757 projects 228 projects

945 projects(After removing duplicates)

16

Data Collection

• mvn clean install – Compiles the project• mvn sonar:sonar – Runs test cases and get statistics

945 projects

872 projectscontain test suites

327 projectsSuccessfully compile, run test

cases & produce coverage

17

Data Collection

Number of Lines of Code

Number of Test Cases

18

Data Collection

Cyclomatic Complexity

Number of Developers

19

Outline


20

Research Questions

RQ1: What are the coverage levels and test success densities exhibited by different projects? RQ2: What are the correlations between various software metrics and code coverage at the project level?

RQ3: What are the correlations between various software metrics and code coverage at the source code file level?

21

Research Questions

RQ1:Coverage Levels & Test Success Densities

22

RQ1: Coverage

Coverage Level (%) Number of Projects0-25 10525-50 9050-75 92

75-100 40

• 40 projects have coverage between 75%-100% • Average Coverage – 41.96%• Median Coverage – 40.30%

Coverage Level Distribution

23

RQ1: Success Density

• 254 projects have test success density >= 98%

Test Success Density• Passing Tests / Total

tests

24

Research Questions

RQ2:Metrics vs. Coverage at Project Level

25

RQ2: Metrics vs. Coverage (Project)Lines of Code vs. Coverage

• Spearman’s rho = -0.306 (Negative Correlation)• p-value = 1.566e-08

26

RQ2: Metrics vs. Coverage (Project)

• Spearman’s rho = -0.276 (Negative Correlation)• p-value = 3.665e-07

Cyclomatic Complexity vs. Coverage

27

RQ2: Metrics vs. Coverage (Project)

• Spearman’s rho = 0.016 (Insignificant Correlation)• p-value = 0.763

Number of Developers vs. Coverage

28

Research Questions

RQ3:Metrics vs. Coverage at File Level

29

RQ3: Metrics vs. Coverage (File)

• Spearman’s rho = 0.180 (Small +ve Correlation)• p-value < 2.2e-16

Lines of Code vs. Coverage

30


• Spearman’s rho = 0.221 (Small +ve Correlation)• p-value < 2.2e-16

Cyclomatic Complexity vs. Coverage

31


• Spearman’s rho = 0.050 (No Correlation)• p-value < 2.2e-16

Number of Developers vs. Coverage

32

Outline


33

Recommendations• Practitioners:

‒ Need to improve testing efforts, especially for large or complex software projects

‒ Need to look into automated test case generation tools

• Researchers:‒ Need to promote new tools that can be easily

used by developers‒ Need to develop test case generation tools

that can scale to large projects

34

Threats to Validity

• Internal validity:– Sonar might produce incorrect metrics or

coverage values• Projects do not conform to Maven directory

structure– We have performed some manual checks

• External validity:– Only analyze 300+ projects from GitHub and

Debian

35

Threats to Validity

• Construct validity:– Make use of standard adequacy criterion

• Code coverage– Make use of standard code metrics

• Lines of code (LOC)• Cyclomatic complexity (CC)

– Little threats to construct validity

36

Related Work• Empirical study on testing and coverage

– Mockus et al. study the impact of coverage on number of post-release defects [1]

– Shamasunder analyze the impact of different kinds of coverage on fault detection [2]

– Gopinath et al. investigate the correlation between coverage and a test suite’s effectiveness in killing mutants [3]

[1] A. Mockus, N. Nagappan, and T. T. Dinh-Trong, “Test coverage and post-verification defects: A multiple case study”, in ESEM, 2009.[2] S. Shamasunder, “Empirical study - pairwise prediction of fault based on coverage”, Master’s thesis, 2012.[3] R Gopinath, C. Jensen, and A. Groce, “Code coverage for suite evaluation for developers”, ICSE, 2014.

37

Related Work• Test case generation techniques

– Thummalapenta et al. automatically generates a series of method invocations to produce a target object state [1]

– Pandita et al. produce test inputs to achieve logical and boundary-value coverage [2]

– Park et al. combines random testing with static program analysis and concolic execution [3]

[1] S, Thummalapenta et al., “Synthesizing method sequences for high-coverage testing”, in OOPSLA, 2011.[2] R. Pandita et al., “Guided test generation for coverage criteria”, ICSM, 2010.[3] S. Park et al., “Carfast: Achieving higher statement coverage faster”, FSE, 2012.

Conclusion

38

• Many open-source projects are poorly tested‒ Only 40/327 projects have high coverage‒ Average coverage: 41.96%

• Coverage is poorer when projects get larger and more complex.

• Coverage is better for larger and more complex source code files.

• Number of developers are not significantly correlated with coverage.

39

Future Work

• Expand the study to include more projects– Address the threats to external validity

• Investigate other software metrics – Common cases of poor coverage

• Investigate the amount of effort required to attain a particular level of coverage– Cost-effectiveness analysis: effort vs. benefit

Thank you!

Questions? Comments? Advice?{kochharps.2012,ferdiant.2013}@[email protected]@lip6.fr