cs5103 software engineering lecture 16 test coverage regression testing

CS5103 Software

Engineering

Lecture 16Test coverage

Regression Testing

2

Today’s class

Test coverage Input combination coverage

Mutation coverage

Regression Testing Test Prioritization

Mocking

3

Input Combination Coverage

Basic idea Origins from the most straightforward idea

In theory, proof of 100% correctness when achieve 100% coverage in theory

In practice, on very trivial cases

Main problems Combinations are exponential

Possible values are infinite

4


An example on a simple automatic sales machine Accept only 1$ bill once and all beverages are 1$

Coke, Sprite, Juice, Water

Icy or normal temperature

Want receipt or not

All combinations = 4*2*2 = 16 combinations

Try all 16 combinations will make sure the system works correctly

5


Sales Machine Example

Coke

Sprite

Juice

Water

Normal

Icy

Receipt

No-Receipt

Input 1 Input 2 Input 3

6

Combination Explosion

Combinations are exponential to the number of inputs

Consider an annual tax report system with 50 yes/no questions to generate a customized form for you

250 combinations = about 1015 test cases

Running 1000 test case for 1 second -> 30,000 years

7

Observation

When there are many inputs, usually a relationship among inputs usually involve only a small number of inputs

The previous example: Maybe only icy coke and sprite, but receipt is independent

8

Example of Tax Report

Input 1: Family combined report or Single report

Input 2: Home loans or not

Input 3: Receive gift or not

Input 4: Age over 60 or not

…

Input 1 is related to all other inputs

Other inputs are independent of each other

9

Studies

A long term study from NIST (national institute of standardization technology) A combination width of 4 to 6 is enough for detecting

almost all errors

10

N-wise coverage

Coverage on N-wise combination of the possible values of all inputs

Example: 2-wise combinations (coke, icy), (sprite, icy), (water, icy), (juice, icy)

(coke, normal), (sprite, normal), …

(coke, receipt), (sprite, receipt), … (coke, no-receipt), (sprite, no-receipt), … (coke, no-receipt), (sprite, no-receipt), … (icy, receipt), (normal, receipt) (icy, no-receipt), (normal, no-receipt) 20 combinations in total We had 16 3-wise combinations, now we have 20, get

worse??

11

N-wise coverage

Note: One test case may cover multiple N-wise combinations E.g., (Coke, Icy, Receipt) covers 3 2-wise combinations

(Coke, Icy), (Coke, Receipt), (Icy, Receipt)

100% N-wise coverage will fully cover 100% (N-1)-wise coverage, is this true?

For K Boolean inputs Full combination coverage = 2k combinations: exponential Full n-wise coverage = 4*k*(k-1)* … *(k-n+1)/n!

combinations: polynomial, for 2-wise combination, 2*k*(k-1)

12

N-wise coverage: Example

How many test cases for 100% 2-wise coverage of our sales machine example? (coke, icy, receipt), covers 3 new 2-wise combinations

(sprite, icy, no-receipt), cover 3 new …

(juice, icy, receipt), covers 2 new …

(water, icy, receipt), covers 2 new …

(coke, normal, no-receipt), covers 3 new …

(sprite, normal, receipt), cover 3 new …

(juice, normal, no-receipt), covers 2 new …

(water, normal, no-receipt), covers 2 new …

8 test cases covers all 20 2-wise combinations

13

Combination Coverage in Practice

2-wise combination coverage is very widely used Pair-wise testing

All pairs testing

Mostly used in configuration testing Example: configuration of gcc

All lot of variables

Several options for each variable

For command line tools: add or remove an option

14

Input model

What happened if an input has infinite possible values Integer

Float

Character

String

Note: all these are actually finite, but the possible value set is too large, so that they are deemed as infinite

Idea: map infinite values to finite value baskets (ranges)

15

Input model

Input partition Partition the possible value set of a input to several

value ranges

Transform numeric variables (integer, float, double, character) to enumerated variables

Example: int exam_score => {less than -1}, {0, 59}, {60,69},

{70,79},

{80,89}, {90, 100}, {100+} char c => {a, z}, {A,Z}, {0,9}, {other}

16

Input model

Feature extraction For string and structure inputs Split the possible value set with a certain feature Example:

String passwd => {contains space}, {no space} It is possible to extract multiple features from one input Example:

String name => {capitalized first letter}, {not}

=> {contains space}, {not}

=> {length >10}, {2-10}, {1}, {0}

One test case may cover multiple features

17

Input model

Feature extraction: structure input A Word Binary Tree (Data at all nodes are strings)

Depth : integer -> partition {0, 1, 1+} Number of leaves : integer -> partition {0, 1, <10, 10+} Root: null / not A node with only left child / not A node with only right child / not Null value data on any node / not Root value: string -> further feature extraction Value on the left most leaf: string -> further feature

extraction …

18

Input model

Infeasible feature combination? Example:

String name => {capitalized first letter}, {not}

=> {contains space}, {not}

=> {length >10}, {2-10}, {1}, {0}

Length = 0 ^ contains space

Length = 0 ^ capitalized first letter

Length = 1 ^ contains space ^ capitalized first letter

19

Input combination coverage

Summary: Try to cover the combination of possible values of

inputs

Exponential combinations: N-wise coverage 2-wise coverage is most popular, all pairs testing

Infinite possible values Input partition Input feature extraction

Coverage is usually 100% once adopted It is easy to achieve, compared with code coverage Models are not easy to write

20

Test coverage

So far, covering inputs and code

The final goal of testing Find all bugs in the software

So there should be a bug coverage

The coverage represents the adequacy of a test suite 50% bug coverage = half done!

100% bug coverage = done!

21

But it is impossible

Bugs are unknown Otherwise we do not need testing

So we have the number of bugs found, we do not know what to divide

One possible solution Estimation

1-10 bugs in 1 KLOC Depends on the type of software and the stage of

development, imprecise When you find many bugs, do you think all bugs are

there or the code is really of low quality?

22

Mutation coverage

How can we know how many bugs there are in the code?

If only we plant those bugs!

Mutation coverage checks the adequacy of a test suite by how many human-planted bugs it can expose

23

Concepts

Mutant A software version with planted bugs

Usually each mutant contains only one planted bug, why?

Mutant Kill Given a test suite S and a mutant m, if there is a test

case t in S, so that execute(original, t) != execute(m, t), we state that S can kill m

Basically, a test suite can kill a mutant, meaning that the test suite is able to detect the planted bug represented by the mutant

24

Illustration

Test Cases

Original

Mutant 1

Mutant 2

Mutant n

...

Oracles

Results

Results

Results

same Survived

different Killed

25

Concepts

Mutation coverage

generated mutants of #

killed mutants of #

26

Mutant generation

Traditional mutation operators Statement deletion

Replace Boolean expression with true/false

Replace arithmetic operators (+, -, *, /, …)

Replace comparison relations (>=, ==, <=, !=)

Replace variables

…

27

Mutation Example: Operator

Mutant operator In original In mutant

Statement Deletion z=x*y+1;

Boolean expression to true | false

if (x<y) if(true)

If(false)

Replace arithmetic operators

z=x*y+1; z=x*y-1

z=x+y-1

Replace comparison operators

if(x<y) if(x<=y)

if(x==y)

Replace variables z=x*y+1; z = z*y+1

z = x*x+1

28

Mutant testing tools

MILU

http://www0.cs.ucl.ac.uk/staff/Y.Jia/#tools MuJava

http://cs.gmu.edu/~offutt/mujava/ Javalanche

https://github.com/david-schuler/javalanche/

29

Summary on all coverage measures

Code coverage Target: code

Adequacy: no -> 100% code coverage != no bugs

Approximation: dataflow, branch, method/statements

Preparation: none (instrumentation can be done automatically)

Overhead: low (instrumentation cause some overhead)

30


Input combination coverage Target: inputs

Adequacy: yes -> 100% input coverage == no bugs

Approximation: n-wise coverage, input partition, input feature extraction

Preparation: hard (require input modelling)

Overhead: none

31


Mutation coverage Target: bugs

Adequacy: no -> 100% mutant coverage != no bugs

Approximation: mutation is already approximation

Preparation: none (mutation and execution can be done automatically)

Overhead: very high (execution on instrumented mutated versions)

32

Regression Testing

So far Unit testing

System testing

Test coverage

All of these are about the first round of testing Testing is performed time to time during the software

life cycle

Test cases / oracles can be reused in all rounds

Testing during the evolution phase is regression testing

33

Regression Testing

When we try to enhance the software We may also bring in bugs

The software works yesterday, but not today, it is called “regression”

Numbers Empirical study on eclipse 2005

11% of commits are bug-inducing

24% of fixing commits are bug-inducing

34

Regression Testing

Run old test cases on the new version of software

It will cost a lot if we run the whole suite each time

Try to save time and cost for new rounds of testing Test Prioritization

Fake Objects

35

Test prioritization

Rank all the test cases

Run test cases according to the ranked sequence

Stop when resources are used up

How to rank test cases To discover bugs sooner

Or approximation: to achieve higher coverage sooner

36

APFD: Measurement of Test Prioritization

Average Percentage of Fault Detected (APFD) Compare two test case sequences

A number of faults (bugs) are detected after each test case

The following two sequences, which is better? S1: T1 (2), t2(3), t3(5) S2: T2(1), t1(3), t3(5)

APFD is the average of these numbers (normalized with the total number of faults), and 0 for initial state

APFD (S1) = (0/5 + 2/5 + 3/5 + 5/5) / 4 = 0.5

APFD (S2) = (0/5 + 1/5 + 3/5 + 5/5) / 4 = 0.45

37

APFD: Illustration

APFD can be deemed as the area under the TestCase-Fault curve Consider t1(f1, f2), t2(f3), t3(f3), t4(f1, f2, f3, f4)

38

Coverage-based test case prioritization

Code coverage based Require recorded code-coverage information in

previous testing

Combination coverage based Require input model

Mutation coverage based Require recorded mutation-killing stats

39

Total Strategy

The simplest strategy

Always select the unselected test case that has the best coverage

40

Example

Consider code coverage on five test cases: T1: s1, s3

T2: s2, s3, s4, s5

T3: s3, s4, s5

T4: s6, s7

T5: s3, s5, s8, s9, s10

Ranking: T5, T2, T3, T1/T4

41

Additional Strategy

An adaption of total strategy

Instead of always choosing the test case with highest coverage Choose the test case that result in most extra

coverage

Starts from the test case with highest coverage

42

Example

Consider code coverage on five test cases: T1: s1, s3

T2: s2, s3, s4, s5

T3: s3, s4, s5

T4: s6, s7

T5: s3, s5, s8, s9, s10

Ranking: T5(5), T2(2, s2, s4) / T4(2, s6, s7), T1(1, s1), T3

43

Fake Objects

A resource waste in regression testing We change the code a little bit

We need to run all the unchanged code in the test execution

Using fake objects For all/some of the unchanged modules

Do not run the modules

Use the results of previous test instead

44

Fake Objects

Example Testing an expert system for finance

Has two components, UI and interest calculator (based on the inputs from UI)

In first round of testing, store as a map the results of interest calculator: (a, b) -> 5%, (a, c) -> 10%, (d, e) -> 7.7%

In regression testing, if the change is made on UI, you can rerun the software with the data map

Using more fake objects means saving more time in regression testing, should we mock every object???

45

Pros & Cons

Pros Saving time in regression testing

Cons Be careful when mocking non-deterministic

components E.g., mocking getSystemTime(), may conflict with

another call

Spend a lot of time for recording data maps

Stored data map can be too huge

When the mocked object is changed, the data map requires updates

46

Selection of faking modules

Rules Using fake objects for time consuming modules

So that you save more time

The fake module should be stable E.g., libraries

The interface should contain a small data flow E.g., numeric inputs and return values

47

Fake objects

Fake objects are not just useful for regression testing

They are also useful for UI Components

Internet Components

Components that will affect real world Sending an email Transfer money from credit cards

48

Next class

Debugging Test coverage based bug localization

Delta debugging

49

Thanks!

cs5103 software engineering lecture 16 test coverage regression testing

Documents