an empirical study on the adequacy of testing in open source projects

40
An Empirical Study on the Adequacy of Testing in Open Source Projects Pavneet S. Kochhar 1 , Ferdian Thung 1 , David Lo 1 , and Julia Lawall 2 1 Singapore Management University 2 Inria/Lip6 France {kochharps.2012,ferdiant.2013,davidlo}@smu.edu.sg , [email protected] Asia-Pacific Software Engineering Conference (APSEC’14)

Upload: pavneet-singh-kochhar

Post on 09-Feb-2017

221 views

Category:

Software


0 download

TRANSCRIPT

Page 1: An Empirical Study on the Adequacy of Testing in Open Source Projects

An Empirical Study on the Adequacy of Testing inOpen Source Projects

Pavneet S. Kochhar1, Ferdian Thung1, David Lo1, and Julia Lawall2

1Singapore Management University2Inria/Lip6 France

{kochharps.2012,ferdiant.2013,davidlo}@smu.edu.sg, [email protected]

Asia-Pacific Software Engineering Conference (APSEC’14)

Page 2: An Empirical Study on the Adequacy of Testing in Open Source Projects

2

Open-Source Software, Why Bother?

• Plethora of open source software used by many commercial applications

• Large organizations investing time, effort and money in open source development

Page 3: An Empirical Study on the Adequacy of Testing in Open Source Projects

3

Software Testing, Why Bother?

Functionality -- Requirements

Bugs -- Software reliability

Costs -- Late bugs cost more

Page 4: An Empirical Study on the Adequacy of Testing in Open Source Projects

4

Software Testing, Why Bother?

• Horgan and Mathur [1]– Adequate testing is critical to develop reliable

software• Tassey [2]

– Inadequate testing cost US economy 59 billion dollars annually

[1] J.R. Horgan and A.P. Mathur, “Software testing and reliability.” McGraw-Hill, Inc., 1996.[2] G. Tassey, “The economic impacts of inadequate infrastructure for software testing,” National Institute of Standards and Technology, 2002.

Page 5: An Empirical Study on the Adequacy of Testing in Open Source Projects

5

Study Goals

• Understand the state-of-the-practice of testing among open source projects

• Make recommendations to improve the state-of-practice

Are open-source projects adequately tested?

Page 6: An Empirical Study on the Adequacy of Testing in Open Source Projects

6

Understanding State-of-Practice

• Study a large number of projects• Check adequacy of testing

– Execute test cases – Assess test adequacy

• Characterize cases of inadequate testing– Correlate project metrics with test adequacy– At various levels of granularity

Page 7: An Empirical Study on the Adequacy of Testing in Open Source Projects

7

Outline

• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work

Page 8: An Empirical Study on the Adequacy of Testing in Open Source Projects

8

Test Adequacy

• Test Adequacy Criterion– Property that must be satisfied for a test suite

to be thorough. – Often measured by code coverage.

• Code Coverage– Percentage of the code executed by test cases

• Line coverage• Branch coverage

Page 9: An Empirical Study on the Adequacy of Testing in Open Source Projects

Test Adequacy

9

CT = number of branches that evaluate to trueCF = number of branches that evaluate to falseB = total number of branchesLC = total number of lines that are executedEL = total number of lines that are executable

Page 10: An Empirical Study on the Adequacy of Testing in Open Source Projects

10

Why Code Coverage?• Mockus et al. [1]

– Higher coverage leads to low post-release defects.

• Berner et al. [2] – Judicious use of coverage helps in finding new

defects.• Shamasunder [3]

– Branch & block coverage have correlation with fault detection.

[1] A. Mockus, N. Nagappan, and T. T. Dinh-Trong, “Test coverage and post-verification defects: A multiple case study,” in ESEM, 2009.[2] S. Berner, R. Weber, and R. K. Keller, “Enhancing software testing by judicious use of code coverage information,” in ICSE, 2007.[3] S. Shamasunder, “Empirical study - pairwise prediction of fault based on coverage,” Master’s thesis, 2012.

Page 11: An Empirical Study on the Adequacy of Testing in Open Source Projects

11

Source Code Metrics

• Number of lines of code (LOC)• Cyclomatic complexity (CC)

– Number of linearly independent paths through the source code

• Number of developers

Page 12: An Empirical Study on the Adequacy of Testing in Open Source Projects

12

Tool Support

• Computes the source code metrics• Runs test cases• Compute the overall coverage• Relies on the maven directory structure

Page 13: An Empirical Study on the Adequacy of Testing in Open Source Projects

13

Outline

• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work

Page 14: An Empirical Study on the Adequacy of Testing in Open Source Projects

14

Data Collection

• The largest site for open source project development– >3,000,000 users & 5,000,000 repositories

• One of the most popular Linux distributions

Page 15: An Empirical Study on the Adequacy of Testing in Open Source Projects

15

Data Collection• Find projects that use Maven

– Needed to run Sonar

757 projects 228 projects

945 projects(After removing duplicates)

Page 16: An Empirical Study on the Adequacy of Testing in Open Source Projects

16

Data Collection

• mvn clean install – Compiles the project• mvn sonar:sonar – Runs test cases and get statistics

945 projects

872 projectscontain test suites

327 projectsSuccessfully compile, run test

cases & produce coverage

Page 17: An Empirical Study on the Adequacy of Testing in Open Source Projects

17

Data Collection

Number of Lines of Code

Number of Test Cases

Page 18: An Empirical Study on the Adequacy of Testing in Open Source Projects

18

Data Collection

Cyclomatic Complexity

Number of Developers

Page 19: An Empirical Study on the Adequacy of Testing in Open Source Projects

19

Outline

• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work

Page 20: An Empirical Study on the Adequacy of Testing in Open Source Projects

20

Research Questions

RQ1: What are the coverage levels and test success densities exhibited by different projects? RQ2: What are the correlations between various software metrics and code coverage at the project level?

RQ3: What are the correlations between various software metrics and code coverage at the source code file level?

Page 21: An Empirical Study on the Adequacy of Testing in Open Source Projects

21

Research Questions

RQ1:Coverage Levels & Test Success Densities

Page 22: An Empirical Study on the Adequacy of Testing in Open Source Projects

22

RQ1: Coverage

Coverage Level (%) Number of Projects0-25 10525-50 9050-75 92

75-100 40

• 40 projects have coverage between 75%-100% • Average Coverage – 41.96%• Median Coverage – 40.30%

Coverage Level Distribution

Page 23: An Empirical Study on the Adequacy of Testing in Open Source Projects

23

RQ1: Success Density

• 254 projects have test success density >= 98%

Test Success Density• Passing Tests / Total

tests

Page 24: An Empirical Study on the Adequacy of Testing in Open Source Projects

24

Research Questions

RQ2:Metrics vs. Coverage at Project Level

Page 25: An Empirical Study on the Adequacy of Testing in Open Source Projects

25

RQ2: Metrics vs. Coverage (Project)Lines of Code vs. Coverage

• Spearman’s rho = -0.306 (Negative Correlation)• p-value = 1.566e-08

Page 26: An Empirical Study on the Adequacy of Testing in Open Source Projects

26

RQ2: Metrics vs. Coverage (Project)

• Spearman’s rho = -0.276 (Negative Correlation)• p-value = 3.665e-07

Cyclomatic Complexity vs. Coverage

Page 27: An Empirical Study on the Adequacy of Testing in Open Source Projects

27

RQ2: Metrics vs. Coverage (Project)

• Spearman’s rho = 0.016 (Insignificant Correlation)• p-value = 0.763

Number of Developers vs. Coverage

Page 28: An Empirical Study on the Adequacy of Testing in Open Source Projects

28

Research Questions

RQ3:Metrics vs. Coverage at File Level

Page 29: An Empirical Study on the Adequacy of Testing in Open Source Projects

29

RQ3: Metrics vs. Coverage (File)

• Spearman’s rho = 0.180 (Small +ve Correlation)• p-value < 2.2e-16

Lines of Code vs. Coverage

Page 30: An Empirical Study on the Adequacy of Testing in Open Source Projects

30

RQ3: Metrics vs. Coverage (File)

• Spearman’s rho = 0.221 (Small +ve Correlation)• p-value < 2.2e-16

Cyclomatic Complexity vs. Coverage

Page 31: An Empirical Study on the Adequacy of Testing in Open Source Projects

31

RQ3: Metrics vs. Coverage (File)

• Spearman’s rho = 0.050 (No Correlation)• p-value < 2.2e-16

Number of Developers vs. Coverage

Page 32: An Empirical Study on the Adequacy of Testing in Open Source Projects

32

Outline

• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work

Page 33: An Empirical Study on the Adequacy of Testing in Open Source Projects

33

Recommendations• Practitioners:

‒ Need to improve testing efforts, especially for large or complex software projects

‒ Need to look into automated test case generation tools

• Researchers:‒ Need to promote new tools that can be easily

used by developers‒ Need to develop test case generation tools

that can scale to large projects

Page 34: An Empirical Study on the Adequacy of Testing in Open Source Projects

34

Threats to Validity

• Internal validity:– Sonar might produce incorrect metrics or

coverage values• Projects do not conform to Maven directory

structure– We have performed some manual checks

• External validity:– Only analyze 300+ projects from GitHub and

Debian

Page 35: An Empirical Study on the Adequacy of Testing in Open Source Projects

35

Threats to Validity

• Construct validity:– Make use of standard adequacy criterion

• Code coverage– Make use of standard code metrics

• Lines of code (LOC)• Cyclomatic complexity (CC)

– Little threats to construct validity

Page 36: An Empirical Study on the Adequacy of Testing in Open Source Projects

36

Related Work• Empirical study on testing and coverage

– Mockus et al. study the impact of coverage on number of post-release defects [1]

– Shamasunder analyze the impact of different kinds of coverage on fault detection [2]

– Gopinath et al. investigate the correlation between coverage and a test suite’s effectiveness in killing mutants [3]

[1] A. Mockus, N. Nagappan, and T. T. Dinh-Trong, “Test coverage and post-verification defects: A multiple case study”, in ESEM, 2009.[2] S. Shamasunder, “Empirical study - pairwise prediction of fault based on coverage”, Master’s thesis, 2012.[3] R Gopinath, C. Jensen, and A. Groce, “Code coverage for suite evaluation for developers”, ICSE, 2014.

Page 37: An Empirical Study on the Adequacy of Testing in Open Source Projects

37

Related Work• Test case generation techniques

– Thummalapenta et al. automatically generates a series of method invocations to produce a target object state [1]

– Pandita et al. produce test inputs to achieve logical and boundary-value coverage [2]

– Park et al. combines random testing with static program analysis and concolic execution [3]

[1] S, Thummalapenta et al., “Synthesizing method sequences for high-coverage testing”, in OOPSLA, 2011.[2] R. Pandita et al., “Guided test generation for coverage criteria”, ICSM, 2010.[3] S. Park et al., “Carfast: Achieving higher statement coverage faster”, FSE, 2012.

Page 38: An Empirical Study on the Adequacy of Testing in Open Source Projects

Conclusion

38

• Many open-source projects are poorly tested‒ Only 40/327 projects have high coverage‒ Average coverage: 41.96%

• Coverage is poorer when projects get larger and more complex.

• Coverage is better for larger and more complex source code files.

• Number of developers are not significantly correlated with coverage.

Page 39: An Empirical Study on the Adequacy of Testing in Open Source Projects

39

Future Work

• Expand the study to include more projects– Address the threats to external validity

• Investigate other software metrics – Common cases of poor coverage

• Investigate the amount of effort required to attain a particular level of coverage– Cost-effectiveness analysis: effort vs. benefit

Page 40: An Empirical Study on the Adequacy of Testing in Open Source Projects

Thank you!

Questions? Comments? Advice?{kochharps.2012,ferdiant.2013}@[email protected]@lip6.fr