18 october 2013. why do we care? therac-25 (1985) 6 massive radiation overdoses multiple space...
TRANSCRIPT
![Page 1: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/1.jpg)
TESTING18 October 2013
![Page 2: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/2.jpg)
Why do we care? Therac-25 (1985)
6 massive radiation overdoses Multiple space fiascos (1990s)
Ariane V exploded after 40 seconds (conversion)Mars Pathfinder computer kept turning itself off
(system timing)Patriot missile misquided (floating point accuracy)
Millenium bug (2000) Microsoft attacks (ongoing) Healthcare.gov
NIST: cost to US, $59 billion annual (2002)
![Page 3: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/3.jpg)
Healthcare.gov As large as Windows XP Last minute change to require log in for any
viewing Failed test with 200 users
Still failing with 20-30 thousandCapacity was supposed to be twice that
Project manager didn’t know there were problems Bypassed their own deploy rules for security Ignored the subcontractor that said it was broken Stray code that was never cleared Caching techniques not used
![Page 4: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/4.jpg)
Quality and testing “Errors should be found and fixed as close
to their place of origin as possible.” Fagan
“Trying to improve quality by increasing testing is like trying to lose weight by weighing yourself more often.” McConnell
![Page 5: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/5.jpg)
Life Testing Used regularly in hardware
Addresses “normal use”
n specimens put to test Test until r failures have been observed Choose n and r to obtain the desired
statistical errors As r and n increase, statistical errors
decrease Expected time in test = mu0 (r / n)
Where mu0 = mean failure time
![Page 6: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/6.jpg)
Butler and Finelli
“The Infeasibility of Experimental Quantification of Life-Critical Software Reliability”
In order to establish that the probability of failure of software is less than 10-9 in 10 hours, testing required with one computer (1990s technology) is greater than 1 million years
![Page 7: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/7.jpg)
Testing Classification
Purpose Scope Access Risk-based Structured vs Free Form
![Page 8: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/8.jpg)
How to identify what to test New features New technology Overworked developers Regression Dependencies Complexity Bug history Language specific bugs Environment changes
Late changes Slipped in “pet” features Ambiguity Changing requirements Bad publicity Liability Learning curve Criticality Popularity
![Page 9: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/9.jpg)
Working backwards
Here’s the case I’m worried about How could I have gotten here?
Different order of entrySkipping intializationReverse state traversal
![Page 10: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/10.jpg)
Best (and Worst) Testing Practices(Boris Beizer) Unit testing to 100% coverage: necessary but not sufficient for new or
changed Integration testing: at every step; not once System testing: AFTER unit and integration testing Testing to requirements: test to end users AND internal users Test execution automation: not all tests can be automated Test design automation: implies building a model. Use only if you can
manage the many tests Stress testing: only need to do it at the start of testing. Runs itself out Regression testing: needs to be automated and frequent Reliability testing: not always applicable. statistics skills required Performance testing: need to consider payoff Independent test groups: not for unit and integration testing Usability testing: only useful if done early Beta testing: not instead of in-house testing
![Page 11: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/11.jpg)
Usability Testing
![Page 12: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/12.jpg)
Usability Testing
Frequency with which the problem occursCommon or rare?
Impact of the problem if it occursEasy or difficult to overcome?
Persistence of the problemOne-time problem or repeated?
![Page 13: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/13.jpg)
Wizard of Oz Testing Inputs and outputs are
as expected How you get between
the two is “anything that works”
Particularly useful when you haveAn internal interface UI choices to make
Children’s Intuitive Gestures in Vision-Based Action GamesCACM Jan 2005, vol. 48, no. 1, p. 47
![Page 14: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/14.jpg)
Number of Usability Test Users Needed
Usability problems found = N(1-(1-L)n)N = total number of usability problems in the
designL = proportion of usability problems discovered by
a single usern = number of users
L=31%
![Page 15: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/15.jpg)
Using as an Estimator Found 100 problems with 10 users Assumption: each user finds 10% of problems How many are left?
found = N(1-(1-L)n)100 = N(1-(1-.1)10)
N = 100/(1-.910)=154 54 left
![Page 16: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/16.jpg)
Am I Done Yet?
![Page 17: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/17.jpg)
Historical Data Lots of variants based on statistical modeling What data should be kept? When are releases comparable? Dangers with a good release
Test foreverAdversarial relation between developers and tester
Dangers with a bad releaseStop too soon
![Page 18: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/18.jpg)
Capture-recapture model Estimate animal populations: How many deer in
the forest? ○ Tag and recount ○ If all tagged, assume you’ve seen them all
Applied to software by Basin in 73 Number of errors = |e1| * |e2| / |e1 ∩ e2 |
where en = errors found by tester n2 testers: 25, 27, 12 overlap: 56 total errors
What’s wrong with this model (aside from the fact the denominator can be 0)?Assumptions about independence of testers
![Page 19: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/19.jpg)
Test Tools: Beyond Unit and Regression
![Page 20: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/20.jpg)
Performance Test Tools
What do they do?Orchestrate test scriptsSimulate heavy loads
What is available?JMeter (Apache project)Grinder (Java framework)
![Page 21: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/21.jpg)
Other Test Tools
Tons of test tools One starting point
http://www.softwareqatest.com/qatweb1.html
![Page 22: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/22.jpg)
Other Quality Improvers
![Page 23: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/23.jpg)
Other Ways of Improving Quality Reviews and inspections Formal specification Program verification and validation Self-checking (paranoid) code Deploy with capabilities to repair
![Page 24: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/24.jpg)
Formal Methods and Specifications Mathematically-based techniques for
describing system properties Used in inference systems
Do not require executing the programProving something about the specification not
already statedFormal proofsMechanizableExamples: theorem provers and proof checkers
![Page 25: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/25.jpg)
Uses of Specifications Requirements analysis
rigor System design
Decomposition, interfaces Verification
Specific sections Documentation System analysis and evaluation
Reference point, uncovering bugs
![Page 26: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/26.jpg)
Examples
Abstract data typesAlgebras, theories, and programs
○ VDM (Praxis: UK Civil aviation display system CDIS)○ Z (Oxford and IBM: CICS)○ Larch (MIT)
Concurrent and distributed systemsState or event sequences, transitions
○ Hoare’s CSP○ Transition axioms○ Lamport’s Temporal Logic
Programming languages!
![Page 27: 18 October 2013. Why do we care? Therac-25 (1985) 6 massive radiation overdoses Multiple space fiascos (1990s) Ariane V exploded after 40 seconds](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649f355503460f94c53787/html5/thumbnails/27.jpg)
Practical Advice Reference:James Whitaker (now at Google)