mtat.03.159: software testing - ut€¦ · system audit section of phone number database 900 0.009...

MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013

MTAT.03.159: Software Testing

Lecture 04:

Static Testing (Inspection)

and Defect Estimation

(Textbook Ch. 10 & 12) Dietmar Pfahl email: [email protected]

Spring 2013


Lecture Reading

• Chapter 10: Reviews (Lab 4)

– Types of reviews

– Defect estimation (not in textbook)

• Chapter 12: Evaluating Software Quality (no Lab)

– Usage-based testing

– Certification testing (not in textbook)


Structure of Lecture 4

• Types of reviews

• Defect estimation

• Usage-based testing

• Certification testing


Reviews (Ch 10)

Terminology

• Static testing – testing without software execution

• Review – meeting to evaluate software artifact

• Inspection – formally defined review

• Walkthrough – author guided review


Why Review?

• Main objective

– Detect faults

• Other objectives

– Inform

– Educate

– Learn from (other’s) mistakes Improve!

• (Undetected) faults may affect software quality

negatively – during all steps of the development

process!


Relative Cost of Faults

Maintenance

200

Source: Davis, A.M., “Software

Requirements: analysis and

specification” (1990)


Reviews

complement

testing


Walkthrough

• Author guides

through artifact

(’static simulation’)

• Attendees scrutinize

and question

• If defects are

detected it’s left to the

author to correct them


Walkthrough

• Objective

– Detect faults

– Become familiar with

the product

• Roles

– Presenter (author)

– Reviewers

(Inspectors)

• Elements

– Planned meeting

– Team (2 to 7 people)

– Brainstorm

• Disadvantage

– Finds fewer faults than

(formal) inspections


Inspections

• Objective:

– Detect faults

– Collect data

– Communicate

information

• Roles

– Moderator

– Reviewers (Inspectors)

– Presenter

– Author

• Elements

– Formal process

– Planned, structured

meeting

– Preparation important

– Team (3 to 6 people)

• Disadvantages

– Short-term cost


Inspection

Process

Fig 10.2


Action

Team

Meeting

Causal

Analysis

Meeting

Defect (Fault)

Detection

(Review / Test)

Software Constr.

(Analyse / Design

/ Code / Rework)

Defect

Database

Organizational

Processes

Artifact

extract sample of defects

find defects fix defects

propose

actions

prioritize & implement

actions

define

Defect Causal Analysis (DCA)


Getting the best from reviews

• The author

– ”… is in the hot seat”

– How do you react?

• The development

team

– Better prepared

– Feedback

– Communication

• The review team

– Critical thinking

– Ability to detect omissions

– Who should participate in

the review?

• Cost-effective verification

– Minimising cost of correction

– Is it cost-effective?


Review Metrics

Basic

• Size of review items

• Review time & effort

• Number of defects

found

• Number of slipping

defects found later

Derived

• Defects found per

review time or effort

• Defects found per

artifact size

• Size per time or effort


Empirical Results

Source:

Runeson, P.; Andersson, C.; Thelin, T.; Andrews, A.; Berling,

T.; , "What do we know about defect detection methods?”,

IEEE Software , vol.23, no.3, pp. 82-90, May-June 2006


Inspections – Empirical Results

• Requirements defects – reviews good since

finding defects early is cheaper

• Design defects – inspections are both more

efficient and more effective than testing

• Code defects - functional or structural testing is

more effective and efficient than inspection.

– May be complementary regarding types of faults

• Generally, reported effectiveness is low

– Inspections find 25-50% of an artifact’s defects

– Testing finds 30-60% of defects in the code


Reading Techniques

– Ad hoc

– Checklist-based

– Defect-based

– Scenario-based

– Usage-based

– Perspective-based


Perspective-based Reading

Scenarios

Purpose Decrease overlap

(redundancy)

Improve

effectiveness

Designer

Tester

User


Capture-Recapture – Defect Estimation


Capture-Recapture – Defect Estimation

• Situation: Two inspectors are assigned to inspect the same

product (Lincoln-Petersen Model)

– d1: defects detected by Inspector 1

– d2: defects detected by Inspector 2

– d12: defects detected by Inspector 1 and Inspector 2

– Nt: total defects (detected and undetected)

– Nr: remaining defects (undetected)

12

21

d

ddNt )( 1221 dddNN tr


Capture-Recapture – Example

• Situation: Two inspectors are assigned to inspect the same

product

– d1: 50 defects detected by Inspector 1

– d2: 40 defects detected by Inspector 2

– d12: 20 defects by both inspectors

– Nt: total defects (detected and undetected)

– Nr: remaining defects (undetected)

10020

4050

12

21

d

ddNt 30)204050(100 rN


Advanced Capture-Recapture Models

• Four basic models used for inspections

– Degree of freedom

• Prerequisites for all models

– All reviewers work independently of each

other

– It is not allowed to inject or remove faults

during inspection


Advanced Capture-Recapture Models

Model

Probability of defect being

found is equal across ...

Estimator Defect Reviewer

M0 Yes Yes Maximum-likelihood

Mt Yes No Maximum-likelihood

Chao’s estimator

Mh No Yes Jackknife

Chao’s estimator

Mth No No Chao’s estimator


Mt Model

Maximum-likelihood:

• Mt = total marked animals (=faults)

at the start of the t'th sampling

interval

• Ct = total number of individuals

sampled during interval t

• Rt = number of recaptures in the

sample Ct

• An approximation of the maximum

likelihood estimate of population size

(N) is: SUM(Ct*Mt)/SUM(Rt)

First resampling:

• M1=50 (first inspector)

• C1=40 (second inspector)

• R1=20

• N=40*50/20=100

Second resampling:

• M2=70 (first and second inspector)

• C2=40 (third inspector)

• R2=30

• N=(40*50+30*70)/(20+30)=4100/50=82

Third resampling:

• M3=80

• C3=30 (fourth inspector)

• R3=30

• N=(2000+2100+30*80)/(20+30+30)=6500/80=

81.xxx







• Software reliability


Software Quality (Chapter 12)

1. Quality relates to the degree to which a system,

system component, or process meets specified

requirements.

2. Quality relates to the degree to which a system,

system component, or process meets customer,

or user, needs or expectations.


Quality Attributes – ISO 9126


Reliability – Terminology

• Reliability: The probability that a system or a capability of a system functions without failure for a specified time in a specified environment

• Reliability Engineering: The discipline of ensuring that a system will be reliable when operated in a specified manner

• Reliability Engineering Goal: Developing software to reach the market

– within planned development time

– within planned development budget

– with known reliability


Statistical Testing

• NOT the same as ad-hoc testing!

• Sampling of tests (test data) follows a probability distribution

– Uniform (Random): probability of available candidate tests (test data) is equal

– Usage-based (Operational): probability of available candidate test (test data) follows an operational profile (i.e., a specific usage pattern)


Usage-based Testing

Usage

specification

Test case

generation

Test

execution

Failure

logging

Certification,

Reliability

estimation

Test Case

1.1.3

Setup

1.1.4

Call

Failure

Report

#13

Output failure


Operational Profile State-Transition Diagram

Usage Specification Models


Operational Profile

• Steps to develop an

operational profile

(Musa 1993)

Definitions:

1. An operational profile is a

quantitative characterization of

how a software system will be

used in its intended

environment.

2. An operational profile is a

specification of classes of inputs

and the probability of their

occurrence.


Operational Profile – Customers

• Customer: person, group, or

institution that is acquiring the

software being developed.

• Customer Group: the set of

customers that will be using

the software in the same way.

• Customer Profile: the

complete set of customer

groups and their associated

occurrence probabilities.


Operational Profile – Users

• User: an individual, group or

institution that actually uses a

given software system.

• User Group: set of users who

will engage the system in the

same way.

• User Profile: set of user

groups and their occurrence

probability.

• Note: There might be

different user groups for

different customer groups


Operational Profile – System Modes

• System Mode: a set of

functions or operations

grouped for convenience in

order to analyze execution

behavior.

• System Mode Profile: set of

system modes and their

occurrence probability.

• Example 1: administrator

mode versus end-user mode

• Example 2: system usage

during peak time vs. off-peak

time


Operational Profile – Functions

• Function: derived from

system requirements, e.g.,

use cases

• Functional Profile: set of

functions and their

occurrence probability.


Operational Profile – Operations

• Operation: are more specific

than functions; they represent

a specific task, with specific

input variable values or

ranges of values. In general,

there may be more

operations than functions

associated with a system.

• Example: a function to modify

a record could evolve into

two operations:

(i) delete old record

(ii) add new record.


Operational Profile – Example /1

The table shows an

example operational

profile of an ATM

system (occurrences

per day)

Operation Occurrence

Rate

Occurrence

Prob. Enter card 16600 0.332

Verify PIN 16600 0.332

Withdraw checking 9950 0.199

Withdraw savings 3300 0.066

Deposit checking 2000 0.040

Deposit savings 1000 0.020

Query status 332 0.00664

Test terminal 166 0.00332

Input to stolen card

list

29 0.00058

Backup files 1 0.000023

Total 50000 1.000000


Operational Profile – Example /2

Operation initiator

Operation Occurrence rate (per h)

Occurrence probability

Subscriber Phone number entry 10,000 0.1

Add subscriber 50 0.0005 System administrator

Delete subscriber 50 0.0005

Process voice call, no pager, answer 18,000 0.18

Process voice call, no pager, no answer 17,000 0.17

Process voice call, pager, answer 17,000 0.17

Process voice call, pager, answer on page

12,000 0.12

Process voice call, pager, no answer on page

10,000 0.1

Telephone network

Process fax call 15,000 0.15

Audit section of phone number database 900 0.009 System controller

Recover from hardware failure 0.1 0.000001

The table shows an example

operational profile of a

component in a telephone

system that is dedicated to

forward incoming telephone calls

to a certain telephone number at

a certain point in time [Mus98].

The example profile provides a

list of operations

initiated by telephone

subscribers, system

administrators, the telephone

network (external system), and

the system controller (part of the

system but external to the

component).


Operational Profile – Guiding Test Case

Allocation

Operations

Infrequent Critical

Determine the

threshold occurrence

probability =

0.5 / #test_cases.

Assign one test case

to each infrequent

operation.

Identify rarely

occurring critical

operations and

assign 2-4 test

cases to each. Assign the remaining test cases to the remaining operations in

accordance with the occurrence probabilities.

1

2

3


Allocating Test Cases – Example /1

Total number of test cases: 500

Threshold occurrence probability: 0.5 / 500 = 0.001

1. Suppose that the number of infrequent operations with occurrence probabilities below threshold is 2.

– Assign 1 test case to each infrequent operation.

2. Suppose that we have one critical operation.

– Assign 2 test cases to it.

3. Distribute the remaining 500 - (2+2) = 496 test cases among the rest of operations based on their occurrence probabilities.



• Example: Occurrence probabilities for normal operation mode.

Infrequent operations below threshold

Critical operation Table from Musa’s Book



Infrequent operations below threshold

Critical operation

Table from Musa’s Book

Number based on occurrence probabilities

~ 500


Question

• How to decide that a component (entity) has sufficient quality?

– In the following: Focus on the Quality Characteristic ‘Reliability’

– Typical application: Components-Off-The-Shelf (COTS) software ( 3rd party software)


Reliability Certification Testing Process

5 Steps:

1. Define the reliability objective

2. Define the usage model and

usage profile (operational profile)

3. Specify test cases

4. Execute certification test

5. Certify software component


Reliability Objective lobj

• Usually, the reliability objective lobj is defined as the desired maximal level of failure intensity (lF) encountered during operation

– Failure intensity (lF) is the inverse of Mean-Time-Between-Failure (MTBF)

• In the context of certification testing, failure intensity can be measured in terms of number of failures per test intensity (or test time or test effort) unit

– Example test intensity units: e.g. CPU hour, test person hour, number of test cases, etc.


Reliability Objective lobj – Examples

• Typical values of reliability objectives are listed below; they are

derived from the estimated impact (damage expressed in terms

of $, and in terms of number of deaths) induced by a failure

(Musa, 1998).


Reliability Demo Chart

• Reliability goals are often stated in

terms of Failure Intensity Objectives

(FIO)

• Usually: Failure Intensity represents

the number of Failures observed in a

defined time period.

• Using a Reliability Demonstration

Chart is an efficient way of checking

whether the FIO (lobj) is met or not.

• It is based on collecting failure data.

– Vertical axis: failure number (n)

– Horizontal axis: expected

number of failures (or:

normalized failure data (Tn), i.e.,

failure time lobj )

Musa (1977)

Expected number of failures

Observed number of failures =

Expected number of failures

(ob

se

rve

d)


How to Define Reject, Continue, Accept

Regions? /1

• The reject, continue, accept regions for a defined reliability

objective (FIO) are based on sequential sampling theory.

• Procedure:

1. Select the discrimination ratio g with which the certification test

will be performed;

2. Select the supplier (or developer) risk a, i.e. the probability of

falsely deciding that the reliability objective is not met when it

is;

3. Select the consumer (or customer) risk b, i.e. the probability of

falsely deciding that the reliability objective is met when it is

not.


How to Define Reject, Continue, Accept

Regions? /2

ln

1 1n

AT n

g

g g

1ln ln

1A B

b b

a a

Tn

n ln

1 1n

BT n

g

g g

Boundary between reject

and continue regions

Boundary between accept

and continue regions

(g is the discrimination ratio)


Reliability Demo Chart – Effects of a, b

and g

• When risk levels (a and b) decrease,

…

or

• When discrimination ratio (g)

decreases, …

• … the system will require more

testing before reaching the Accept or

Reject regions

– i.e., the Continue region gets

wider.


RDC: Example /1

• Consumer risk

b = 0.05

• Supplier risk

a = 0.05

• Discrimination

ratio

g = 2


RDC: Example /2

• Consumer risk

b = 0.01

• Supplier risk

a = 0.01

• Discrimination

ratio

g = 2


RDC: Example /3

• Consumer risk

b = 0.001

• Supplier risk

a = 0.001

• Discrimination

ratio

g = 2


RDC: Example /4

• Consumer risk

b = 0.1

• Supplier risk

a = 0.1

• Discrimination ratio

g = 1.2


Example 1

Failure

number

Measure

(million

transactions)

Normalized

Measure

(= expected

Failure number)

1 0.1875 0.75

2 0.3125 1.25

3 1.25 5

lobj = 4 failures / million transactions

a = 0.1

b = 0.1

g = 2


Example 2

Failure

number

Measure

(CPU hour)

Normalized

Measure

(= expected

Failure number)

1 8 0.8

2 19 1.9

3 60 6

lobj = 0.1 failures / CPU hour

a = 0.05

b = 0.05

g = 2


Example 3

We have developed a program for a

Web server with a target failure

intensity of 1 failure/1,000,000

transactions. The program runs for 50

hours, handling 10,000 transactions

per hour on average, with no failures

occurring. How confident are we that

the program has met its objective?

Can we release the software now?

lobj = 1 failure / (106 transactions)

a = 0.1 b = 0.1 g = 2


Example 3

Failure

number

Measure

(transactions)

Normalized

Measure

(= expected

Failure

number)

1 ? 500,000 0.5

1 ? 1,000,000 1

1 ? 3,000,000 3

lobj = 1 failure / (106 transactions)

a = 0.1 b = 0.1 g = 2


Recommended

Textbook Exercises

• Chapter 10

– 1, 5, 6, 7, 9, 11

• Chapter 12

– 2, 3, 7


Next Week

• Lecture 5:

– Industry Presentation by Madis Jullinen: "Gaming as a

gateway to better testing."

• Lab 4:

– Document Inspection and Defect Prediction

• In addition to do:

– Continue working on project

– Read textbook chapters 10 and 12 (available via OIS)

mtat.03.159: software testing - ut€¦ · system audit section of phone number database 900 0.009...

Documents