cost / benefits arguments for automation and coverage jeff offutt professor, software engineering...

@ GMU

Cost / Benefits Arguments for

Automation and Coverage

Jeff OffuttProfessor, Software Engineering

George Mason University

Fairfax, VA USA

www.cs.gmu.edu/~offutt/

[email protected]

@ GMU Who Am I PhD Georgia Institute of Technology, 1988 Professor at George Mason University since 1992

– BS, MS, PhD in Software Engineering (also CS) Lead the Software Engineering MS program

– Oldest and largest in USA Editor-in-Chief of Wiley’s journal of Software Testing,

Verification and Reliability (STVR) Co-Founder of IEEE International Conference on

Software Testing, Verification and Validation (ICST) Co-Author of Introduction to Software Testing

(Cambridge University Press)

NoVa TAIG, August 2011 © Jeff Offutt 2

@ GMU Software is a Skin that Surrounds Our Civilization


Quote due to Dr. Mark Harman

@ GMU Costly Software Failures


NIST report, “The Economic Impacts of Inadequate Infrastructure for Software Testing” (2002)– Inadequate software testing costs the US alone between $22 and

$59 billion annually– Better approaches could cut this amount in half

Huge losses due to web application failures– Financial services : $6.5 million per hour (just in USA!)– Credit card sales applications : $2.4 million per hour (in USA)

In Dec 2006, amazon.com’s BOGO offer turned into a double discount

2007 : Symantec says that most security vulnerabilities are due to faulty software

World-wide monetary loss due to poor software is

staggering

@ GMU Types of Test Activities Testing can be broken up into four general types of activities

1. Test Design2. Test Automation3. Test Execution4. Test Evaluation

Each type of activity requires different skills, background knowledge, education and training

No reasonable software development organization uses the same people for requirements, design, implementation, integration and configuration control


Why do test organizations still use the same people for all four test activities??

This clearly wastes resources

1.a) Criteria-based

1.b) Human-based

@ GMU 1. Test Design – (a) Criteria-Based

This is the most technical job in software testing Requires knowledge of :

– Discrete math– Programming– Testing

Requires much of a traditional CS degree This is intellectually stimulating, rewarding, and challenging Test design is analogous to software architecture on the development

side Using people who are not qualified to design tests is a sure way to

get ineffective tests


Design test values to satisfy coverage criteria or other engineering goal

@ GMU 1. Test Design – (b) Human-Based

This is much harder than it may seem to developers Criteria-based approaches can be blind to special situations Requires knowledge of :

– Domain, testing, and user interfaces Requires almost no traditional CS

– A background in the domain of the software is essential– An empirical background is very helpful (biology, psychology, …)– A logic background is very helpful (law, philosophy, math, …)

This is intellectually stimulating, rewarding, and challenging– But not to typical CS majors – they want to solve problems and build things


Design test values based on domain knowledge of the program and human knowledge of testing

@ GMU Model-Driven Test Design – Steps


software

artifact

model / structur

e

test requireme

nts

refined requirement

s / test specs

input values

test cases

test script

s

test result

s

pass / fail

IMPLEMENTATIONABSTRACTION

LEVEL

DESIGNABSTRACTION

LEVEL

analysis

criterion refine

generate

prefixpostfix

expected

automateexecuteevaluate

test requireme

ntsdomain analysis

feedback

@ GMU MDTD – Activities


software

artifact

model / structur

e

test requireme

nts

refined requirement

s / test specs

input values

test cases

test script

s

test result

s

pass / fail

IMPLEMENTATIONABSTRACTION

LEVEL

DESIGNABSTRACTION

LEVEL

Test Design

Test Automatio

n

Test Execution

Test Evaluation

Raising our abstraction level makestest design MUCH easier

@ GMU Example Coverage Criteria Statement coverage … more generally known as

node coverage on graphs Branch coverage … more generally known as edge

coverage on graphs Prime path coverage (graphs) Predicate coverage (logic) Multiple condition / decision coverage (MCDC) …

also known as correlated active clause coverage Input space partitioning Mutation analysis coverage


@ GMU Test Coverage Criteria Test coverage criteria use classic engineering

abstraction–Civil engineers use algebra and calculus to model parts

of the real world–Then solve problems with those models–Instead of algebra and calculus, we use discrete math …

logic, graphs, grammar, sets Why are test criteria growing in use now ?

–We need to use test automation before using criteria–Tool support is essential–Testers need to have more knowledge than in the past


@ GMU Example Success Stories

These slides introduce some specific examples of how some of these ideas are being used in companies

Some companies are mentioned by name–Some names cannot be mentioned

I discuss some general process notes Then discuss examples of some of the specific

criteria being used


@ GMU Google Programmers spend up to half of their time testing

– Unit testing is measured as part of programmer productivity– Programmers must solve all problems found in system testing, immediately– If quality is bad, system testers refuse to help

Products are shipped daily– Release and iterate cycle– Focus on fast fixing instead of prevention

All tests are fully automated Teams choose their own test criteria, but teams must use criteria They have saved tens of millions of dollars

– Automation– Developer responsibility– Immediate feedback


Source – Patrick Copeland, Keynote Address, Intl Conf on Software Testing, Verification and

Validation (ICST 2010)

@ GMU Amazon All tests are automated and documented Developers are educated in testing Developers are measured by their unit tests’ quality

–Developers are rewarded for finding unit faults–Developers are measured by the number of faults found

during system testing that trace back to them They have lots of internal-use tools for automation

and measuring criteria


Source – visit to the company

@ GMU Microsoft Software Development Engineer in Test (SDET)

–Developers who specialize in testing (not SMEs) Goal is to automate all tests They use Input Space Partitioning for many of their

tests Many groups use graph-based criteria (branch or

node coverage)


Source – How We Test Software at Microsoft, by Page, Johnston, and Rollison

@ GMU Major US Government Contractor Last year a manager started applying these ideas in her

project– Focused on unit / developer testing– Held monthly reviews of documentation quality, code structure,

and unit tests– Required use of test automation tools– Required use of a simple graph criterion (all branches)

Established a test design expert and a test automation expert

She received a commendation for saving tens of thousands of dollars in a few months– Is now teaching her approach to other managers on the project


Source – personal contact

@ GMU Graph Criteria Web software company (in Northern VA)

– Applying graph criteria to develop tests for new web applications– Automation with httpunit– Reduced deployment errors by 50%, reduced cost by 25%– Updating automated tests is a lot of work

Government contractor of security assessment tools– Applying graph criteria to test their threat assessment engines– Automation with JUnit and internal automation framework– Cut time to deploy new products by 20%, reduced development

cost by 15%


Sources – consulting / part-time student employee

@ GMU Logic Criteria Company that builds embedded, safety-critical, real-time,

software for trains– Applied CACC to post-deployment communication software– Found over a dozen faults, 3 safety-critical, 2 real-time– Fixed all problems before the software failed in the system– Logic testing is now mandated on all safety-critical software

Aerospace company that manufactures planes– Applied CACC to flight guidance software (embedded, real-time,

safety critical)– Found numerous problems– Automation estimated to have saved 30% of testing cost


Sources – Student industry project / consulting

@ GMU Input Space Partitioning Freddie Mac (major financial service company)

– System testing on calculation engines• Faults can cause millions of dollars loss

– Test manager tested two similar products, one with their traditional method and one using ISP

– Special purpose tools to support ISP– ISP tests found 3.5 times as many faults, with half the effort

• ZERO defects reported in deployment (after 2 years)

– ISP is now being disseminated throughout the company Dozens of companies in Northern Virginia have used ISP

over the past 15 years– All saved money and found more faults


Sources – MS Thesis at GMU / part-time student employees

@ GMU Mutation Testing A major network router manufacturer

– One of my students applied mutation to an essential engine in a router – embedded, real-time software

• Already been in deployment for years

– Found 3 major problems, one of which had cost the company over $70 million in downtime and lost revenue

– My student got a bonus of $800,000 (1999) Telecommunications company

– Real-time, embedded software, plus web applications– I helped apply mutation testing and graph criteria to 3 software

components – past testing, ready for deployment– About 150 tests found over 50 separate issues – at 25% the cost

of their usual system testing


Sources – student / consulting

@ GMU Advantages of Criteria-Based Test Design

Criteria maximize the “bang for the buck”–Fewer tests that are more effective at finding faults

Comprehensive test set with minimal overlap Traceability from software artifacts to tests

–The “why” for each test is answered–Built-in support for regression testing

A “stopping rule” for testing—advance knowledge of how many tests are needed

Natural to automate


@ GMU Criteria-Based Testing Summary


• Many companies still use “monkey testing”• A human sits at the keyboard, wiggles the mouse

and bangs the keyboard• No automation• Minimal training required

• Some companies automate human-designed tests• Reduces execution cost• Eases repeat testing

• But companies that use automation and criteria-based test design

Save money

Find more faults

Build better software

@ GMU

© Jeff Offutt 23

Contact

Jeff Offutt

[email protected]

http://cs.gmu.edu/~offutt/

NoVa TAIG, August 2011

We are in the middle of a revolution in how software is tested

Research is finally meeting practice

cost / benefits arguments for automation and coverage jeff offutt professor, software engineering...

Documents

software testing n

n professor

training n

test evaluation n

criteriabased n

challenging n test design

usa n editor

inadequate software