seeding bugs to find bugs

Download Seeding Bugs To Find Bugs

Post on 15-Nov-2014

11.023 views

Category:

Technology

4 download

Embed Size (px)

DESCRIPTION

Seeding Bugs to Find Bugs: Mutation Testing RevisitedHow do you know your test suite is "good enough"?  One of the best ways to tell is _mutation testing_.  Mutation testing seeds artificial defects (mutations) into a program and checks whether your test suite finds them.  If it does not, this means your test suite is not adequate yet.Despite its effectiveness, mutation testing has two issues.  First, it requires large computing resources to re-run the test suite again and again.  Second, and this is worse, a mutation to the program can keep the program's semantics unchanged -- and thus cannot be detected by any test.  Such _equivalent mutants_ act as false positives; they have to be assessed and isolated manually, which is an extremely tedious task.In this talk, I present the JAVALANCHE framework for mutation testing of Java programs, which addresses both the problems of efficiency and equivalent mutants.  First, JAVALANCHE is built for efficiency from the ground up, manipulating byte code directly and allowing mutation testing of programs that are several orders of magnitude larger than earlier research subjects.  Second, JAVALANCHE addresses the problem of equivalent mutants by assessing the _impact_ of mutations on dynamic invariants: The more invariants impacted by a mutation, the more likely it is to be useful for improving test suites.We have evaluated JAVALANCHE on seven industrial-size programs, confirming its effectiveness.  With less than 3% of equivalent mutants, our approach provides a precise and fully automatic measure of the adequacy of a test suite -- making mutation testing, finally, applicable in practice.Joint work with David Schuler and Valentin Dallmeier.Andreas Zeller is computer science professor at Saarland University; he researches large programs and their history, and has developed a number of methods to determine the causes of program failures - on open-source programs as well as in industrial contexts at IBM, Microsoft, SAP and others.  His book "Why Programs Fail"  has received the Software Development Magazine productivity award in 2006.

TRANSCRIPT

  • 1. Seeding Bugs to Find Bugs Andreas Zeller Saarland University with David Schuler and Valentin Dallmeier

2. Saarbrcken 3. Saarbrcken 4. Saarbrcken 5. We built it! 6. Testing 7. Test Quality You want your program to be well tested How do we measure well tested? 8. 0,*!#@0A.(#-.(4#$%&!'(,#-.(.L!#$%&!'.(#-.(.7test!quot; #$%&!'()*&!+!(,#-.(./ A ! #$%&!'.)*&!+!.(#-.(./ 0,*!-1!+!2/ 9$0:(!4'()*&7!quot;B #>%:?( ;&%:?( ;&%:?(;&%:?( ;&%:?(;& index + 1 && 0signature.charAt(index + 1) == '>') index = index + 2; ... return new ResultHolder (signature.substring(1, index)); } } 36. Impact on Invariants signatureToStringInternal() mutated methodUnitDeclaration.resolve()post: target eld is now zeroDelegatingOutputStream.write() pre: upper bound of argument changes WeaverAdapter.addingTypeMunger()pre: target eld is now non-zero ReferenceContext.resolve() post: target eld is now non-zero 37. Impact on Invariants public ResultHolder signatureToStringInternal(String signature) { switch(signature.charAt(0)) { ... case 'L': { // Full class name // Look for closing `;' int index = signature.indexOf(';'); // Jump to the correct ';' if (index != -1 && signature.length() > index + 1 && 0signature.charAt(index + 1) == '>') index = index + 2; ... return new ResultHolder (signature.substring(1, index)); } }impacts 40 invariants but undetected by AspectJ unit tests 38. Javalanche Mutation Testing Framework for Java 12 man-months of implementation effort Efcient Mutationdirectly Focus on few mutation Manipulate byte codeTestingoperators Use mutant schemata Use coverage data Ranks impact on dynamicImpact Uses efcient ChecksMutations by invariantsinvariant learner and checker 39. Mutation Testingwith Javalanche Program 40. Mutation Testing with Javalanche 1. Learn invariants from test suite 2. Insert invariant checkers intocode 3. Detect impact of mutations 4. Select mutations with the mostinvariants violated(= the highest impact) Program 41. But does it work? 42. Evaluation 1. Are mutations that violateinvariants useful?2. Are mutations with thehighest impact most useful?3. Are mutants that violate invariantsless likely to be equivalent? 43. Evaluation Subjects Name Lines of Code #Tests AspectJ Core94,902 321 AOP extension to JavaBarbecue 4,837 137 Bar Code ReaderCommons 18,7821,590 Helper UtilitiesJaxen 12,449 680 XPath EngineJoda-Time 25,8613,447 Date and Time LibraryJTopas 2,031 128 Parser toolsXStream 14,480 838 XML Object Serialization 44. Mutations Name #Mutations %detected AspectJ Core 47,146 53 Barbecue 17,178 67 Commons15,125 83 Jaxen 6,712 61 Joda-Time13,859 79 JTopas1,533 72 XStream 5,186 92 45. Performance Learning invariants one-timeexpensive 22 CPU hours for AspectJis veryeffort Creating checkers is somewhat expensive 10 CPU hours for AspectJ one-time effort Mutation testing is feasible in for XStream 14 CPU hours for AspectJ, 6 CPU hours practice 46. Results 47. Evaluation 1. Are mutations that violateinvariants useful?2. Are mutations with thehighest impact most useful? What is a useful mutation?3. Are mutants that violate invariantsless likely to be equivalent? 48. Useful MutationsA technique for generating mutants is useful if most of the generated mutants are detected: less likely to bemutants = non-equivalent mutants because detectableequivalent close to real suite is designed to catch real defects because the testdefects 49. Mutations we look fornot violatingviolating invariants invariants not detectedby test suite detectedby test suite ?! 50. Are mutations that violate invariants useful?Non-violating mutants detected Violating mutants detected10075 Mutations that violate invariants are more likely to be detected by actual50 tests and thus likely to be useful.25 0 AspectJ Barbecue CommonsJaxen Joda-TimeJTopasXStream All differences are statistically signicant according to 2 test 51. Are mutations with the highest impact most useful? Non-violating mutants detected Top 5% violating mutants detected10075Mutations that violate several invariantsare most likely to be detected by actual50 tests and thus the most useful.25 0 AspectJ Barbecue Commons Jaxen Joda-TimeJTopasXStreamAll differences are statistically signicant according to 2 test 52. Detection Rates 100 100 80 80Percentage of mutants killed 100%Percentage of mutants killed100 60 60 40 40 BarbecueCommons 20 20 0 080 20 4060 80 10020 4060 80 100Percentage of mutants includedPercentage of mutants included Detection rate Percentage of mutants killed60 100 100 80 80Percentage of mutants killedPercentage of mutants killed 60 60 40 4040 Jaxen Joda-Time 20 20 0 0 20 4060 80 10020 4060 80 100Percentage of mutants includedPercentage of mutants included20AspectJ 100 1000%0 80 80Percentage of mutants killedPercentage of mutants killed0% 100%60 6020 4060 80 100 Top n% mutants 40 40 Percentage of mutants includedJTopas XStream 20 20 0 0 20 4060 80 10020 4060 80 100Percentage of mutants includedPercentage of mutants included 53. Are mutants that violate invariantsless likely to be equivalent? Randomly selected non-detected Jaxen mutants 12 violating, 12 non-violating Manual inspection: Are mutations equivalent? Mutation was proven non-equivalent iff we could create a detecting test case Assessment took 30 minutes per mutation 54. Are mutants that violate invariantsless likely to be equivalent?Non-EquivalentEquivalentEquivalent 2 Non-Equivalent4In our sample, mutants that violatedseveral invariants were signicantly lessEquivalent8Non-Equivalent to be equivalent.likely 10Violating mutantsNon-violating mutants Difference is statistically signicant according to Fisher test Mutations and tests made public to counter researcher bias 55. Evaluation 1. Are mutations that violateinvariants useful? 2. Are mutations with thehighest impact most useful? 3. Are mutants that violate invariantsless likely to be equivalent? 56. Efciency Inspection 57. Mid AspectJ 16 LOC~94,000 LOC 58. Future Work How effectivecompared to traditional coverage on a large scale is mutation testing? Predicting defectsimpact product quality? How does test quality Alternative impact measures sequences Coverage Program spectra Method Adaptive mutation testing survive Evolve mutations to have the ttest 59. Conclusion 60. http://www.st.cs.uni-sb.de/mutation/