property-based testing a silver bullet? john hughes december 2009
Post on 30-Mar-2015
223 Views
Preview:
TRANSCRIPT
Property-Based TestingA Silver Bullet?
John HughesDecember 2009
Software testing: most famous quote
• ”Program testing can be used to show the presence of bugs, but never to show their absence!”
– E.W.Dijkstra
$60 billion
$240 billion
50%
Money spent on testing
Cost of remaining
errors≈
Testing in Practice?
• Human effort?
• Test automation
Large-Scale Test Automation
• Nightly runs provide rapid feedback• New test cases added for each error found
Test Server
Software under test
Automated test cases
Report of test case failures
1,5MLOC Erlang, 2MLOC C++
700KLOC Erlang
Typical Large Projects
Design team Test team
Bug Detection Rate
Developer Testing
• Why wait until system testing to use test automation?– Why not automate developers’ own testing?
• Unit testing—one module in isolation– A key element of agile development methods such
as XP
Claims for Unit Testing
• Immediate discovery of errors– bug fixing is cheap!
• Confidence in refactoring– cleaner code!
• TDD: write tests first, then just enough code to make them pass– KISS! No wasted effort!
• Tests serve as a specification– So keep test code clean and elegant!– Not too many… one test for each thing!
TDD with HUnit in Haskell
• Problem: implement a key-value store
-- Type signaturesempty :: Store k vstore :: Ord k => k -> v -> Store k v -> Store k vfind :: Ord k => k -> Store k v -> Maybe vremove :: Ord k => k -> Store k v -> Store k v
Step 1: Tests for find
testFindEmpty = "find empty" ~: find 1 empty @?= (Nothing :: Maybe Int)
A test case is a definition
Attach a name to a test case
An assertion (@)—equality where left side is unknown,
right side is ”expected” value
testFind1 = "find with one element" ~: find 1 (store 1 2 empty) @?= Just 2
testFind2 = "find with two elements" ~: do let s = store 1 2 (store 3 4 empty) find 1 s @?= Just 2 find 3 s @?= Just 4 find 5 s @?= Nothing
Can combine several assertions and IO
actions in one test case
Hunit Glue
import Test.HUnit
main = runTestTT findTests
findTests = "find tests" ~: [testFindEmpty, testFind1, testFind2]
Step 2: Run the tests
import Test.HUnit
main = runTestTT findTests
findTests = "find tests" ~: [testFindEmpty, testFind1, testFind2]
data Store k v = Storefind = undefinedstore = undefinedremove = undefinedempty = undefined
*Main> main### Error in: find tests:0:find emptyPrelude.undefined### Error in: find tests:1:find with one elementPrelude.undefined### Error in: find tests:2:find with two elementsPrelude.undefinedCases: 3 Tried: 3 Errors: 3 Failures: 0Counts {cases = 3, tried = 3, errors = 3, failures = 0}*Main>
A message from each failing test
A summary of the test results
Step 3: Write just enough codedata Store k v = Nil | Node k v (Store k v) (Store k v) deriving (Eq, Show)
find k Nil = Nothingfind k (Node k' v l r) | k == k' = Just v | k < k' = find k l | k > k' = find k r
store k v Nil = Node k v Nil Nilstore k v (Node k' v' l r) | k'<= k = Node k' v' (store k v l) r | k' > k = Node k' v' l (store k v r)
empty = Nil
remove = undefined
Ordered binary trees
Don’t write remove yet
Step 4: Repeat the tests
*Main> main### Failure in: find tests:2:find with two elementsexpected: Just 2 but got: NothingCases: 3 Tried: 3 Errors: 0 Failures: 1Counts {cases = 3, tried = 3, errors = 0, failures = 1}
testFind2 = "find with two elements" ~: do let s = store 1 2 (store 3 4 empty) find 1 s @?= Just 2 find 3 s @?= Just 4 find 5 s @?= Nothing
Step 5: Debug the code
store k v Nil = Node k v Nil Nilstore k v (Node k' v' l r) | k'<= k = Node k' v' (store k v l) r | k' > k = Node k' v' l (store k v r)
k <=k'k > k'
Step 6: Rerun the tests
• All the tests pass—now we write more tests!
*Main> mainCases: 3 Tried: 3 Errors: 0 Failures: 0Counts {cases = 3, tried = 3, errors = 0, failures = 0}
Next Iteration: tests for removeremoveTests = "remove tests" ~: [testRemoveEmpty, testRemove1, testRemove2]
testRemoveEmpty = "remove empty" ~: remove 1 empty @?= (empty :: Store Int Int)
testRemove1 = "remove with one element" ~: remove 1 (store 1 2 empty) @?= empty
testRemove2 = "remove with two elements" ~: do let s = store 1 2 (store 3 4 empty) remove 1 s @?= store 3 4 empty remove 3 s @?= store 1 2 empty remove 5 s @?= s
Run the tests
main = runTestTT allTests
allTests = "all tests" ~: [findTests, removeTests]
*Main> main### Error in: all tests:1:remove tests:0:remove emptyPrelude.undefined### Error in: all tests:1:remove tests:1:remove with one elementPrelude.undefined### Error in: all tests:1:remove tests:2:remove with two elementsPrelude.undefinedCases: 6 Tried: 6 Errors: 3 Failures: 0Counts {cases = 6, tried = 6, errors = 3, failures = 0}
Implementation of remove
k,v
nk,nv
Code for remove
remove k Nil = Nilremove k (Node k' v l r) | k == k' = case r of Nil -> l _ -> let (nk,nv) = leftmost r in Node nk nv l (remove nk r) | k < k' = Node k' v (remove k l) r | k > k' = Node k' v l (remove k r)
leftmost (Node k v Nil _) = (k,v)leftmost (Node _ _ l _) = leftmost l
Last step: rerun the tests
• No failures, so we’re done!
*Main> mainCases: 6 Tried: 6 Errors: 0 Failures: 0Counts {cases = 6, tried = 6, errors = 0, failures = 0}
…or are we???
Test Coverage
• All tests pass—but how good are our tests?
• Source code coverage tools tell us how much code we tested
• When tests pass, check coverage!
Using Haskell Program Coverage
C:\Users\John Hughes\Desktop> ghc -fhpc Store.hs --make
C:\Users\John Hughes\Desktop> Store.exeCases: 6 Tried: 6 Errors: 0 Failures: 0
C:\Users\John Hughes\Desktop> hpc markup Store.exeWriting: Main.hs.html…
Marked-up source code
Conditions which were always true
Code which was never
executed!
Just… one… more… test…
testRemoveNonEmptyRightBranch = "remove with non-empty right branch" ~: remove 1 (store 3 4 (store 1 2 empty)) @?= store 3 4 empty
But…
• This last test has nothing to do with a specification
• It cannot be written ”first”• Test cases written just to get coverage are
often bad test cases• Many many tests are needed—boring!
• Does TDD really cut the mustard?
Which Unit Tests to Write?
• ”You should test things that might break” —Kent Beck
• Not too few, not too many
• Partition the cases into classes with similar behaviour
• Write one test per partition
Example: insertion into an ordered list
• Partitions:– Empty list/non-empty list– Insert at beginning/middle/end• Test boundary values and middle values
– Element already present/not present
Partition tests
• insertNonEmpty covered by other cases• insertAbsent covered by other cases• Note: expected values play a major rôle!
insertEmpty = "insert empty" ~: insert 1 [] @?= [1]insertStart = "insert start" ~: insert 1 [2,4] @?= [1,2,4]insertMid = "insert mid" ~: insert 3 [2,4] @?= [2,3,4]insertEnd = "insert end" ~: insert 5 [2,4] @?= [2,4,5]insertPresent = "insert present" ~: insert 1 [1] @?= [1,1]
Sum or Product of Partitions?
• Given several ways to partition inputs, should we– Write one test for each partition?– Write one test for each combination of partitions?• E.g. Non-empty/Beginning/Present,
Non-empty/Beginning/Absent, …
– (Can be smart and cover all pairs of partitions, or all triples…)
Property Based Testing
• Generate test cases instead of inventing them– Automate the boring bit!– Reduce size of test code
• Focus on properties true in all cases, not single tests– A true specification
• Minimize failing test cases to speed debugging
Generating Stores
• How to generate, how to shrink
instance (Ord k, Arbitrary k, Arbitrary v) => Arbitrary (Store k v) where
arbitrary = do (k,v,s) <- arbitrary elements [empty, store k v s, remove k s]
shrink Nil = [] shrink (Node k v l r) = [l,r] ++ [Node k v l' r | l' <- shrink l] ++ [Node k v l r' | r' <- shrink r] ++ [Node k v' l r | v' <- shrink v]
Model-based testing
• What does a store represent?– A set of key-value pairs!
model s = List.sort (contents s)
contents Nil = []contents (Node k v l r) = (k,v):contents l ++ contents r
Sorted so we can compare them with ==
Properties: Agreement with the model
prop_find k s = find k s == lookup k (model s) where types = s :: Store Int Int
prop_store k v s = model (store k v s) == List.insert (k,v) (model s) where types = s :: Store Int Int
prop_remove k s = case find k s of Just v -> model (remove k s) == model s List.\\ [(k,v)] Nothing -> remove k s == s where types = s :: Store Int Int
Testing the Properties
• We forgot to consider duplicate keys!
*Main> quickCheckWith stdArgs{maxSuccess=10000} prop_find*** Failed! Falsifiable (after 95 tests and 1 shrink):1Node 1 1 (Node 1 (-1) Nil Nil) Nil
prop_find k s = case [v | (k',v) <- model s, k==k'] of [] -> find k s == Nothing vs -> find k s `elem` map Just vs where types = s :: Store Int Int
Testing remove
• We’re not removing the duplicate key…???
*Main> quickCheckWith stdArgs{maxSuccess=10000} prop_remove+++ OK, passed 10000 tests.*Main> quickCheckWith stdArgs{maxSuccess=10000} prop_remove*** Failed! Falsifiable (after 2 tests):0Node 0 1 Nil (Node 1 1 (Node 1 0 Nil Nil) Nil)
*Main> let s = Node 0 1 Nil (Node 1 1 (Node 1 0 Nil Nil) Nil)*Main> find 0 sJust 1*Main> remove 0 sNode 1 0 Nil (Node 1 0 Nil Nil)
The Bug
remove k Nil = Nilremove k (Node k' v l r) | k == k' = case r of Nil -> l _ -> let (nk,nv) = leftmost r in Node nk nv l (remove nk r) | k < k' = Node k' v (remove k l) r | k > k' = Node k' v l (remove k r)
Removes nk with the wrong value
How are we doing for coverage?
Hunit vs QuickCheckHunit QuickCheck
Bugs found after TDD 0 1
Source code coverage 95% 100%
Lines of test code 35 26
• QuickCheck– Finds more bugs– With better coverage– In less time– With less test code– And a clearer specification
OK
3G Radio Base Station
Setup
Setup
OK
Reject
Media Proxy
• Multimedia IP-telephony (IMS)• Connects calls across a firewall• Test adding and removing callers from a call
Add Add Sub Add Sub Add Sub
Call Full
Property Based Testing is Great!
• Improves quality!– Finds more bugs, achieves better coverage
• Reduces cost!– Less test code, shrinking speeds diagnosis
• And it’s actually fun!– ”Please can I write some tests today?”
How do we know?
• Case studies in industry+Real software development+ Professional software developers- Unrepeatable- Difficult to control
• Experiments in universities+ Focus on a single question+ Carefully controlled– Student volunteers– Unrealistically small
Test Driven Development
• Case Studies– YES, quality is improved– NO, cost is not reduced (costs rise about 20%)
• Experiments– YES, code is developed faster– NO, quality is not improved (it drops)
Property Based Testing
• Case studies+ Property-based testing does increase quality+ Property-based testing does reduce cost
– PBT during system testing actually reduces the quality of conventional unit testing done earlier
Our Experiment
• Hypothesis:– Property-based testing is more effective than conventional
unit testing
• Effective?– Quality (better quality in the same time)
• number of bugs, number of tests failed, subjective judgement
– Test quality• code coverage, subjective judgement
– Design quality• Size of code, size of test code, subjective judgement
Isn’t it obvious?
• Well…– Property-based testing requires writing
generators… additional work!
– Property-based testing requires formalizing the specification, unit testing only needs examples
– Property-based tests are selected at random, unit tests are carefully crafted… better at provoking errors?
• A comparison of Hunit and QuickCheck– Solve two problems, one with Hunit, one with
QuickCheck• Time: 13:15—16:00, Wednesday 8 December• Place: lab rooms 3507, 3354 and 3358• Info: sign up at
http://groups.google.com/group/quickcheck-experiment
Next week, in a lab room near you…
Exercises
• Get the HUnit tests for insert running
• Add test cases to cover all combinations of partitions
• Write and test functions to extract the base name and extension from a file name:– baseName ”Foo.hs” == ”Foo”– extension ”Foo.hs” == ”hs”
top related