property-based testing a silver bullet? john hughes december 2009

Property-Based TestingA Silver Bullet?

John HughesDecember 2009

Software testing: most famous quote

• ”Program testing can be used to show the presence of bugs, but never to show their absence!”

– E.W.Dijkstra

$60 billion

$240 billion

Money spent on testing

Cost of remaining

errors≈

Testing in Practice?

• Human effort?

• Test automation

Large-Scale Test Automation

• Nightly runs provide rapid feedback• New test cases added for each error found

Test Server

Software under test

Automated test cases

Report of test case failures

1,5MLOC Erlang, 2MLOC C++

700KLOC Erlang

Typical Large Projects

Design team Test team

Bug Detection Rate

Developer Testing

• Why wait until system testing to use test automation?– Why not automate developers’ own testing?

• Unit testing—one module in isolation– A key element of agile development methods such

Claims for Unit Testing

• Immediate discovery of errors– bug fixing is cheap!

• Confidence in refactoring– cleaner code!

• TDD: write tests first, then just enough code to make them pass– KISS! No wasted effort!

• Tests serve as a specification– So keep test code clean and elegant!– Not too many… one test for each thing!

TDD with HUnit in Haskell

• Problem: implement a key-value store

-- Type signaturesempty :: Store k vstore :: Ord k => k -> v -> Store k v -> Store k vfind :: Ord k => k -> Store k v -> Maybe vremove :: Ord k => k -> Store k v -> Store k v

Step 1: Tests for find

testFindEmpty = "find empty" ~: find 1 empty @?= (Nothing :: Maybe Int)

A test case is a definition

Attach a name to a test case

An assertion (@)—equality where left side is unknown,

right side is ”expected” value

testFind1 = "find with one element" ~: find 1 (store 1 2 empty) @?= Just 2

testFind2 = "find with two elements" ~: do let s = store 1 2 (store 3 4 empty) find 1 s @?= Just 2 find 3 s @?= Just 4 find 5 s @?= Nothing

Can combine several assertions and IO

actions in one test case

Hunit Glue

import Test.HUnit

main = runTestTT findTests

findTests = "find tests" ~: [testFindEmpty, testFind1, testFind2]

Step 2: Run the tests

import Test.HUnit

main = runTestTT findTests

findTests = "find tests" ~: [testFindEmpty, testFind1, testFind2]

data Store k v = Storefind = undefinedstore = undefinedremove = undefinedempty = undefined

*Main> main### Error in: find tests:0:find emptyPrelude.undefined### Error in: find tests:1:find with one elementPrelude.undefined### Error in: find tests:2:find with two elementsPrelude.undefinedCases: 3 Tried: 3 Errors: 3 Failures: 0Counts {cases = 3, tried = 3, errors = 3, failures = 0}*Main>

A message from each failing test

A summary of the test results

Step 3: Write just enough codedata Store k v = Nil | Node k v (Store k v) (Store k v) deriving (Eq, Show)

find k Nil = Nothingfind k (Node k' v l r) | k == k' = Just v | k < k' = find k l | k > k' = find k r

store k v Nil = Node k v Nil Nilstore k v (Node k' v' l r) | k'<= k = Node k' v' (store k v l) r | k' > k = Node k' v' l (store k v r)

empty = Nil

remove = undefined

Ordered binary trees

Don’t write remove yet

Step 4: Repeat the tests

*Main> main### Failure in: find tests:2:find with two elementsexpected: Just 2 but got: NothingCases: 3 Tried: 3 Errors: 0 Failures: 1Counts {cases = 3, tried = 3, errors = 0, failures = 1}

testFind2 = "find with two elements" ~: do let s = store 1 2 (store 3 4 empty) find 1 s @?= Just 2 find 3 s @?= Just 4 find 5 s @?= Nothing

Step 5: Debug the code

store k v Nil = Node k v Nil Nilstore k v (Node k' v' l r) | k'<= k = Node k' v' (store k v l) r | k' > k = Node k' v' l (store k v r)

k <=k'k > k'

Step 6: Rerun the tests

• All the tests pass—now we write more tests!

*Main> mainCases: 3 Tried: 3 Errors: 0 Failures: 0Counts {cases = 3, tried = 3, errors = 0, failures = 0}

Next Iteration: tests for removeremoveTests = "remove tests" ~: [testRemoveEmpty, testRemove1, testRemove2]

testRemoveEmpty = "remove empty" ~: remove 1 empty @?= (empty :: Store Int Int)

testRemove1 = "remove with one element" ~: remove 1 (store 1 2 empty) @?= empty

testRemove2 = "remove with two elements" ~: do let s = store 1 2 (store 3 4 empty) remove 1 s @?= store 3 4 empty remove 3 s @?= store 1 2 empty remove 5 s @?= s

Run the tests

main = runTestTT allTests

allTests = "all tests" ~: [findTests, removeTests]

*Main> main### Error in: all tests:1:remove tests:0:remove emptyPrelude.undefined### Error in: all tests:1:remove tests:1:remove with one elementPrelude.undefined### Error in: all tests:1:remove tests:2:remove with two elementsPrelude.undefinedCases: 6 Tried: 6 Errors: 3 Failures: 0Counts {cases = 6, tried = 6, errors = 3, failures = 0}

Implementation of remove

Code for remove

remove k Nil = Nilremove k (Node k' v l r) | k == k' = case r of Nil -> l _ -> let (nk,nv) = leftmost r in Node nk nv l (remove nk r) | k < k' = Node k' v (remove k l) r | k > k' = Node k' v l (remove k r)

leftmost (Node k v Nil _) = (k,v)leftmost (Node _ _ l _) = leftmost l

Last step: rerun the tests

• No failures, so we’re done!

*Main> mainCases: 6 Tried: 6 Errors: 0 Failures: 0Counts {cases = 6, tried = 6, errors = 0, failures = 0}

…or are we???

Test Coverage

• All tests pass—but how good are our tests?

• Source code coverage tools tell us how much code we tested

• When tests pass, check coverage!

Using Haskell Program Coverage

C:\Users\John Hughes\Desktop> ghc -fhpc Store.hs --make

C:\Users\John Hughes\Desktop> Store.exeCases: 6 Tried: 6 Errors: 0 Failures: 0

C:\Users\John Hughes\Desktop> hpc markup Store.exeWriting: Main.hs.html…

Marked-up source code

Conditions which were always true

Code which was never

executed!

Just… one… more… test…

testRemoveNonEmptyRightBranch = "remove with non-empty right branch" ~: remove 1 (store 3 4 (store 1 2 empty)) @?= store 3 4 empty

But…

• This last test has nothing to do with a specification

• It cannot be written ”first”• Test cases written just to get coverage are

often bad test cases• Many many tests are needed—boring!

• Does TDD really cut the mustard?

Which Unit Tests to Write?

• ”You should test things that might break” —Kent Beck

• Not too few, not too many

• Partition the cases into classes with similar behaviour

• Write one test per partition

Example: insertion into an ordered list

• Partitions:– Empty list/non-empty list– Insert at beginning/middle/end• Test boundary values and middle values

– Element already present/not present

Partition tests

• insertNonEmpty covered by other cases• insertAbsent covered by other cases• Note: expected values play a major rôle!

insertEmpty = "insert empty" ~: insert 1 [] @?= [1]insertStart = "insert start" ~: insert 1 [2,4] @?= [1,2,4]insertMid = "insert mid" ~: insert 3 [2,4] @?= [2,3,4]insertEnd = "insert end" ~: insert 5 [2,4] @?= [2,4,5]insertPresent = "insert present" ~: insert 1 [1] @?= [1,1]

Sum or Product of Partitions?

• Given several ways to partition inputs, should we– Write one test for each partition?– Write one test for each combination of partitions?• E.g. Non-empty/Beginning/Present,

Non-empty/Beginning/Absent, …

– (Can be smart and cover all pairs of partitions, or all triples…)

Property Based Testing

• Generate test cases instead of inventing them– Automate the boring bit!– Reduce size of test code

• Focus on properties true in all cases, not single tests– A true specification

• Minimize failing test cases to speed debugging

Generating Stores

• How to generate, how to shrink

instance (Ord k, Arbitrary k, Arbitrary v) => Arbitrary (Store k v) where

arbitrary = do (k,v,s) <- arbitrary elements [empty, store k v s, remove k s]

shrink Nil = [] shrink (Node k v l r) = [l,r] ++ [Node k v l' r | l' <- shrink l] ++ [Node k v l r' | r' <- shrink r] ++ [Node k v' l r | v' <- shrink v]

Model-based testing

• What does a store represent?– A set of key-value pairs!

model s = List.sort (contents s)

contents Nil = []contents (Node k v l r) = (k,v):contents l ++ contents r

Sorted so we can compare them with ==

Properties: Agreement with the model

prop_find k s = find k s == lookup k (model s) where types = s :: Store Int Int

prop_store k v s = model (store k v s) == List.insert (k,v) (model s) where types = s :: Store Int Int

prop_remove k s = case find k s of Just v -> model (remove k s) == model s List.\\ [(k,v)] Nothing -> remove k s == s where types = s :: Store Int Int

Testing the Properties

• We forgot to consider duplicate keys!

*Main> quickCheckWith stdArgs{maxSuccess=10000} prop_find*** Failed! Falsifiable (after 95 tests and 1 shrink):1Node 1 1 (Node 1 (-1) Nil Nil) Nil

prop_find k s = case [v | (k',v) <- model s, k==k'] of [] -> find k s == Nothing vs -> find k s `elem` map Just vs where types = s :: Store Int Int

Testing remove

• We’re not removing the duplicate key…???

*Main> quickCheckWith stdArgs{maxSuccess=10000} prop_remove+++ OK, passed 10000 tests.*Main> quickCheckWith stdArgs{maxSuccess=10000} prop_remove*** Failed! Falsifiable (after 2 tests):0Node 0 1 Nil (Node 1 1 (Node 1 0 Nil Nil) Nil)

*Main> let s = Node 0 1 Nil (Node 1 1 (Node 1 0 Nil Nil) Nil)*Main> find 0 sJust 1*Main> remove 0 sNode 1 0 Nil (Node 1 0 Nil Nil)

The Bug

remove k Nil = Nilremove k (Node k' v l r) | k == k' = case r of Nil -> l _ -> let (nk,nv) = leftmost r in Node nk nv l (remove nk r) | k < k' = Node k' v (remove k l) r | k > k' = Node k' v l (remove k r)

Removes nk with the wrong value

How are we doing for coverage?

Hunit vs QuickCheckHunit QuickCheck

Bugs found after TDD 0 1

Source code coverage 95% 100%

Lines of test code 35 26

• QuickCheck– Finds more bugs– With better coverage– In less time– With less test code– And a clearer specification

3G Radio Base Station

Reject

Media Proxy

• Multimedia IP-telephony (IMS)• Connects calls across a firewall• Test adding and removing callers from a call

Add Add Sub Add Sub Add Sub

Call Full

Property Based Testing is Great!

• Improves quality!– Finds more bugs, achieves better coverage

• Reduces cost!– Less test code, shrinking speeds diagnosis

• And it’s actually fun!– ”Please can I write some tests today?”

How do we know?

• Case studies in industry+Real software development+ Professional software developers- Unrepeatable- Difficult to control

• Experiments in universities+ Focus on a single question+ Carefully controlled– Student volunteers– Unrealistically small

Test Driven Development

• Case Studies– YES, quality is improved– NO, cost is not reduced (costs rise about 20%)

• Experiments– YES, code is developed faster– NO, quality is not improved (it drops)

Property Based Testing

• Case studies+ Property-based testing does increase quality+ Property-based testing does reduce cost

– PBT during system testing actually reduces the quality of conventional unit testing done earlier

Our Experiment

• Hypothesis:– Property-based testing is more effective than conventional

unit testing

• Effective?– Quality (better quality in the same time)

• number of bugs, number of tests failed, subjective judgement

– Test quality• code coverage, subjective judgement

– Design quality• Size of code, size of test code, subjective judgement

Isn’t it obvious?

• Well…– Property-based testing requires writing

generators… additional work!

– Property-based testing requires formalizing the specification, unit testing only needs examples

– Property-based tests are selected at random, unit tests are carefully crafted… better at provoking errors?

• A comparison of Hunit and QuickCheck– Solve two problems, one with Hunit, one with

QuickCheck• Time: 13:15—16:00, Wednesday 8 December• Place: lab rooms 3507, 3354 and 3358• Info: sign up at

http://groups.google.com/group/quickcheck-experiment

Next week, in a lab room near you…

Exercises

• Get the HUnit tests for insert running

• Add test cases to cover all combinations of partitions

• Write and test functions to extract the base name and extension from a file name:– baseName ”Foo.hs” == ”Foo”– extension ”Foo.hs” == ”hs”

property-based testing a silver bullet? john hughes december 2009

tests main

v store

v store

v node

undefined remove

undefined store

v nil nil store

tests passnow

Documents

no silver bullet - university of colorado boulder · no...

no silver bullet @kasa open seminar

the mrap: not a silver bullet, but a bullet nonetheless

the silver bullet

the silver bullet syndrome by alexey vasiliev

looking for a silver bullet???

property-based testing a silver bullet ?

silver nanoparticles the real silver bullet in clinical...

silver bullet gets creative - epson.co.uk · silver bullet...

silver bullet or fools’ gold? - pubs.iied.org

sliding windows – silver bullet or evolutionary deadend?

silver bullet rx7500 - best radar detector

growth hacking - there's no silver bullet

silver nanoparticles the real silver bullet in clinical...

no silver bullet? silver buckshot may work ·...

the silver bullet and the silver · pdf filethe silver...

ironkey enterprise with silver bullet service

silver bullet by stephen king

silver bullet deconstruction

isomorphic js - new silver bullet