mtat.03.159: software testing - ut€¦ · system audit section of phone number database 900 0.009...
TRANSCRIPT
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
MTAT.03.159: Software Testing
Lecture 04:
Static Testing (Inspection)
and Defect Estimation
(Textbook Ch. 10 & 12) Dietmar Pfahl email: [email protected]
Spring 2013
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Lecture Reading
• Chapter 10: Reviews (Lab 4)
– Types of reviews
– Defect estimation (not in textbook)
• Chapter 12: Evaluating Software Quality (no Lab)
– Usage-based testing
– Certification testing (not in textbook)
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Structure of Lecture 4
• Types of reviews
• Defect estimation
• Usage-based testing
• Certification testing
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Reviews (Ch 10)
Terminology
• Static testing – testing without software execution
• Review – meeting to evaluate software artifact
• Inspection – formally defined review
• Walkthrough – author guided review
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Why Review?
• Main objective
– Detect faults
• Other objectives
– Inform
– Educate
– Learn from (other’s) mistakes Improve!
• (Undetected) faults may affect software quality
negatively – during all steps of the development
process!
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Relative Cost of Faults
Maintenance
200
Source: Davis, A.M., “Software
Requirements: analysis and
specification” (1990)
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Reviews
complement
testing
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Walkthrough
• Author guides
through artifact
(’static simulation’)
• Attendees scrutinize
and question
• If defects are
detected it’s left to the
author to correct them
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Walkthrough
• Objective
– Detect faults
– Become familiar with
the product
• Roles
– Presenter (author)
– Reviewers
(Inspectors)
• Elements
– Planned meeting
– Team (2 to 7 people)
– Brainstorm
• Disadvantage
– Finds fewer faults than
(formal) inspections
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Inspections
• Objective:
– Detect faults
– Collect data
– Communicate
information
• Roles
– Moderator
– Reviewers (Inspectors)
– Presenter
– Author
• Elements
– Formal process
– Planned, structured
meeting
– Preparation important
– Team (3 to 6 people)
• Disadvantages
– Short-term cost
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Inspection
Process
Fig 10.2
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Action
Team
Meeting
Causal
Analysis
Meeting
Defect (Fault)
Detection
(Review / Test)
Software Constr.
(Analyse / Design
/ Code / Rework)
Defect
Database
Organizational
Processes
Artifact
extract sample of defects
find defects fix defects
propose
actions
prioritize & implement
actions
define
Defect Causal Analysis (DCA)
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Getting the best from reviews
• The author
– ”… is in the hot seat”
– How do you react?
• The development
team
– Better prepared
– Feedback
– Communication
• The review team
– Critical thinking
– Ability to detect omissions
– Who should participate in
the review?
• Cost-effective verification
– Minimising cost of correction
– Is it cost-effective?
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Review Metrics
Basic
• Size of review items
• Review time & effort
• Number of defects
found
• Number of slipping
defects found later
Derived
• Defects found per
review time or effort
• Defects found per
artifact size
• Size per time or effort
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Empirical Results
Source:
Runeson, P.; Andersson, C.; Thelin, T.; Andrews, A.; Berling,
T.; , "What do we know about defect detection methods?”,
IEEE Software , vol.23, no.3, pp. 82-90, May-June 2006
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Inspections – Empirical Results
• Requirements defects – reviews good since
finding defects early is cheaper
• Design defects – inspections are both more
efficient and more effective than testing
• Code defects - functional or structural testing is
more effective and efficient than inspection.
– May be complementary regarding types of faults
• Generally, reported effectiveness is low
– Inspections find 25-50% of an artifact’s defects
– Testing finds 30-60% of defects in the code
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Reading Techniques
– Ad hoc
– Checklist-based
– Defect-based
– Scenario-based
– Usage-based
– Perspective-based
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Perspective-based Reading
Scenarios
Purpose Decrease overlap
(redundancy)
Improve
effectiveness
Designer
Tester
User
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Structure of Lecture 4
• Types of reviews
• Defect estimation
• Usage-based testing
• Certification testing
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Capture-Recapture – Defect Estimation
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Capture-Recapture – Defect Estimation
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Capture-Recapture – Defect Estimation
• Situation: Two inspectors are assigned to inspect the same
product (Lincoln-Petersen Model)
– d1: defects detected by Inspector 1
– d2: defects detected by Inspector 2
– d12: defects detected by Inspector 1 and Inspector 2
– Nt: total defects (detected and undetected)
– Nr: remaining defects (undetected)
12
21
d
ddNt )( 1221 dddNN tr
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Capture-Recapture – Example
• Situation: Two inspectors are assigned to inspect the same
product
– d1: 50 defects detected by Inspector 1
– d2: 40 defects detected by Inspector 2
– d12: 20 defects by both inspectors
– Nt: total defects (detected and undetected)
– Nr: remaining defects (undetected)
10020
4050
12
21
d
ddNt 30)204050(100 rN
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Advanced Capture-Recapture Models
• Four basic models used for inspections
– Degree of freedom
• Prerequisites for all models
– All reviewers work independently of each
other
– It is not allowed to inject or remove faults
during inspection
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Advanced Capture-Recapture Models
Model
Probability of defect being
found is equal across ...
Estimator Defect Reviewer
M0 Yes Yes Maximum-likelihood
Mt Yes No Maximum-likelihood
Chao’s estimator
Mh No Yes Jackknife
Chao’s estimator
Mth No No Chao’s estimator
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Mt Model
Maximum-likelihood:
• Mt = total marked animals (=faults)
at the start of the t'th sampling
interval
• Ct = total number of individuals
sampled during interval t
• Rt = number of recaptures in the
sample Ct
• An approximation of the maximum
likelihood estimate of population size
(N) is: SUM(Ct*Mt)/SUM(Rt)
First resampling:
• M1=50 (first inspector)
• C1=40 (second inspector)
• R1=20
• N=40*50/20=100
Second resampling:
• M2=70 (first and second inspector)
• C2=40 (third inspector)
• R2=30
• N=(40*50+30*70)/(20+30)=4100/50=82
Third resampling:
• M3=80
• C3=30 (fourth inspector)
• R3=30
• N=(2000+2100+30*80)/(20+30+30)=6500/80=
81.xxx
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Structure of Lecture 4
• Types of reviews
• Defect estimation
• Usage-based testing
• Certification testing
• Software reliability
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Software Quality (Chapter 12)
1. Quality relates to the degree to which a system,
system component, or process meets specified
requirements.
2. Quality relates to the degree to which a system,
system component, or process meets customer,
or user, needs or expectations.
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Quality Attributes – ISO 9126
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Reliability – Terminology
• Reliability: The probability that a system or a capability of a system functions without failure for a specified time in a specified environment
• Reliability Engineering: The discipline of ensuring that a system will be reliable when operated in a specified manner
• Reliability Engineering Goal: Developing software to reach the market
– within planned development time
– within planned development budget
– with known reliability
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Statistical Testing
• NOT the same as ad-hoc testing!
• Sampling of tests (test data) follows a probability distribution
– Uniform (Random): probability of available candidate tests (test data) is equal
– Usage-based (Operational): probability of available candidate test (test data) follows an operational profile (i.e., a specific usage pattern)
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Usage-based Testing
Usage
specification
Test case
generation
Test
execution
Failure
logging
Certification,
Reliability
estimation
Test Case
1.1.3
Setup
1.1.4
Call
Failure
Report
#13
Output failure
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Operational Profile State-Transition Diagram
Usage Specification Models
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Operational Profile
• Steps to develop an
operational profile
(Musa 1993)
Definitions:
1. An operational profile is a
quantitative characterization of
how a software system will be
used in its intended
environment.
2. An operational profile is a
specification of classes of inputs
and the probability of their
occurrence.
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Operational Profile – Customers
• Customer: person, group, or
institution that is acquiring the
software being developed.
• Customer Group: the set of
customers that will be using
the software in the same way.
• Customer Profile: the
complete set of customer
groups and their associated
occurrence probabilities.
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Operational Profile – Users
• User: an individual, group or
institution that actually uses a
given software system.
• User Group: set of users who
will engage the system in the
same way.
• User Profile: set of user
groups and their occurrence
probability.
• Note: There might be
different user groups for
different customer groups
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Operational Profile – System Modes
• System Mode: a set of
functions or operations
grouped for convenience in
order to analyze execution
behavior.
• System Mode Profile: set of
system modes and their
occurrence probability.
• Example 1: administrator
mode versus end-user mode
• Example 2: system usage
during peak time vs. off-peak
time
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Operational Profile – Functions
• Function: derived from
system requirements, e.g.,
use cases
• Functional Profile: set of
functions and their
occurrence probability.
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Operational Profile – Operations
• Operation: are more specific
than functions; they represent
a specific task, with specific
input variable values or
ranges of values. In general,
there may be more
operations than functions
associated with a system.
• Example: a function to modify
a record could evolve into
two operations:
(i) delete old record
(ii) add new record.
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Operational Profile – Example /1
The table shows an
example operational
profile of an ATM
system (occurrences
per day)
Operation Occurrence
Rate
Occurrence
Prob. Enter card 16600 0.332
Verify PIN 16600 0.332
Withdraw checking 9950 0.199
Withdraw savings 3300 0.066
Deposit checking 2000 0.040
Deposit savings 1000 0.020
Query status 332 0.00664
Test terminal 166 0.00332
Input to stolen card
list
29 0.00058
Backup files 1 0.000023
Total 50000 1.000000
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Operational Profile – Example /2
Operation initiator
Operation Occurrence rate (per h)
Occurrence probability
Subscriber Phone number entry 10,000 0.1
Add subscriber 50 0.0005 System administrator
Delete subscriber 50 0.0005
Process voice call, no pager, answer 18,000 0.18
Process voice call, no pager, no answer 17,000 0.17
Process voice call, pager, answer 17,000 0.17
Process voice call, pager, answer on page
12,000 0.12
Process voice call, pager, no answer on page
10,000 0.1
Telephone network
Process fax call 15,000 0.15
Audit section of phone number database 900 0.009 System controller
Recover from hardware failure 0.1 0.000001
The table shows an example
operational profile of a
component in a telephone
system that is dedicated to
forward incoming telephone calls
to a certain telephone number at
a certain point in time [Mus98].
The example profile provides a
list of operations
initiated by telephone
subscribers, system
administrators, the telephone
network (external system), and
the system controller (part of the
system but external to the
component).
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Operational Profile – Guiding Test Case
Allocation
Operations
Infrequent Critical
Determine the
threshold occurrence
probability =
0.5 / #test_cases.
Assign one test case
to each infrequent
operation.
Identify rarely
occurring critical
operations and
assign 2-4 test
cases to each. Assign the remaining test cases to the remaining operations in
accordance with the occurrence probabilities.
1
2
3
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Allocating Test Cases – Example /1
Total number of test cases: 500
Threshold occurrence probability: 0.5 / 500 = 0.001
1. Suppose that the number of infrequent operations with occurrence probabilities below threshold is 2.
– Assign 1 test case to each infrequent operation.
2. Suppose that we have one critical operation.
– Assign 2 test cases to it.
3. Distribute the remaining 500 - (2+2) = 496 test cases among the rest of operations based on their occurrence probabilities.
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Allocating Test Cases – Example /2
• Example: Occurrence probabilities for normal operation mode.
Infrequent operations below threshold
Critical operation Table from Musa’s Book
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Allocating Test Cases – Example /3
Infrequent operations below threshold
Critical operation
Table from Musa’s Book
Number based on occurrence probabilities
~ 500
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Structure of Lecture 4
• Types of reviews
• Defect estimation
• Usage-based testing
• Certification testing
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Question
• How to decide that a component (entity) has sufficient quality?
– In the following: Focus on the Quality Characteristic ‘Reliability’
– Typical application: Components-Off-The-Shelf (COTS) software ( 3rd party software)
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Reliability Certification Testing Process
5 Steps:
1. Define the reliability objective
2. Define the usage model and
usage profile (operational profile)
3. Specify test cases
4. Execute certification test
5. Certify software component
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Reliability Objective lobj
• Usually, the reliability objective lobj is defined as the desired maximal level of failure intensity (lF) encountered during operation
– Failure intensity (lF) is the inverse of Mean-Time-Between-Failure (MTBF)
• In the context of certification testing, failure intensity can be measured in terms of number of failures per test intensity (or test time or test effort) unit
– Example test intensity units: e.g. CPU hour, test person hour, number of test cases, etc.
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Reliability Objective lobj – Examples
• Typical values of reliability objectives are listed below; they are
derived from the estimated impact (damage expressed in terms
of $, and in terms of number of deaths) induced by a failure
(Musa, 1998).
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Reliability Demo Chart
• Reliability goals are often stated in
terms of Failure Intensity Objectives
(FIO)
• Usually: Failure Intensity represents
the number of Failures observed in a
defined time period.
• Using a Reliability Demonstration
Chart is an efficient way of checking
whether the FIO (lobj) is met or not.
• It is based on collecting failure data.
– Vertical axis: failure number (n)
– Horizontal axis: expected
number of failures (or:
normalized failure data (Tn), i.e.,
failure time lobj )
Musa (1977)
Expected number of failures
Observed number of failures =
Expected number of failures
(ob
se
rve
d)
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
How to Define Reject, Continue, Accept
Regions? /1
• The reject, continue, accept regions for a defined reliability
objective (FIO) are based on sequential sampling theory.
• Procedure:
1. Select the discrimination ratio g with which the certification test
will be performed;
2. Select the supplier (or developer) risk a, i.e. the probability of
falsely deciding that the reliability objective is not met when it
is;
3. Select the consumer (or customer) risk b, i.e. the probability of
falsely deciding that the reliability objective is met when it is
not.
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
How to Define Reject, Continue, Accept
Regions? /2
ln
1 1n
AT n
g
g g
1ln ln
1A B
b b
a a
Tn
n ln
1 1n
BT n
g
g g
Boundary between reject
and continue regions
Boundary between accept
and continue regions
(g is the discrimination ratio)
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Reliability Demo Chart – Effects of a, b
and g
• When risk levels (a and b) decrease,
…
or
• When discrimination ratio (g)
decreases, …
• … the system will require more
testing before reaching the Accept or
Reject regions
– i.e., the Continue region gets
wider.
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
RDC: Example /1
• Consumer risk
b = 0.05
• Supplier risk
a = 0.05
• Discrimination
ratio
g = 2
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
RDC: Example /2
• Consumer risk
b = 0.01
• Supplier risk
a = 0.01
• Discrimination
ratio
g = 2
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
RDC: Example /3
• Consumer risk
b = 0.001
• Supplier risk
a = 0.001
• Discrimination
ratio
g = 2
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
RDC: Example /4
• Consumer risk
b = 0.1
• Supplier risk
a = 0.1
• Discrimination ratio
g = 1.2
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Example 1
Failure
number
Measure
(million
transactions)
Normalized
Measure
(= expected
Failure number)
1 0.1875 0.75
2 0.3125 1.25
3 1.25 5
lobj = 4 failures / million transactions
a = 0.1
b = 0.1
g = 2
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Example 2
Failure
number
Measure
(CPU hour)
Normalized
Measure
(= expected
Failure number)
1 8 0.8
2 19 1.9
3 60 6
lobj = 0.1 failures / CPU hour
a = 0.05
b = 0.05
g = 2
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Example 3
We have developed a program for a
Web server with a target failure
intensity of 1 failure/1,000,000
transactions. The program runs for 50
hours, handling 10,000 transactions
per hour on average, with no failures
occurring. How confident are we that
the program has met its objective?
Can we release the software now?
lobj = 1 failure / (106 transactions)
a = 0.1 b = 0.1 g = 2
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Example 3
Failure
number
Measure
(transactions)
Normalized
Measure
(= expected
Failure
number)
1 ? 500,000 0.5
1 ? 1,000,000 1
1 ? 3,000,000 3
lobj = 1 failure / (106 transactions)
a = 0.1 b = 0.1 g = 2
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Recommended
Textbook Exercises
• Chapter 10
– 1, 5, 6, 7, 9, 11
• Chapter 12
– 2, 3, 7
MTAT.03.159 / Lecture 04 / © Dietmar Pfahl 2013
Next Week
• Lecture 5:
– Industry Presentation by Madis Jullinen: "Gaming as a
gateway to better testing."
• Lab 4:
– Document Inspection and Defect Prediction
• In addition to do:
– Continue working on project
– Read textbook chapters 10 and 12 (available via OIS)