SENG 521SENG 521Software Reliability & Software Reliability & Software Reliability & Software Reliability & Software QualitySoftware Qualityyy
Chapter Chapter 11: : OverviewOverview
Department of Electrical & Computer Engineering, University of Calgary
B.H. Far ([email protected])http://www enel ucalgary ca/People/far/Lectures/SENG521/
http://www.enel.ucalgary.ca/People/far/Lectures/SENG521/
ContentsContentsSh t iSh t iShorter version:Shorter version: How to avoid these?How to avoid these?
ContentsContentsL iL iLonger version:Longer version: What is this course about?What is this course about? What factors affect software What factors affect software
quality?quality?Wh tWh t i ft li bilit ?i ft li bilit ? What What is software reliability?is software reliability?
What What is software reliability is software reliability engineeringengineering??engineeringengineering??
What is software What is software reliability reliability engineeringengineering process?process?engineering engineering process?process?
What Affects Quality?What Affects Quality?
SENG521 [email protected] 4
What Affects What Affects Software Quality?Software Quality?Time:Time: Time:Time: Meeting the project
deadline. Quality Reaching the market at
the right time.C tC t
Q y
Cost:Cost: Meeting the anticipated
project costs.
Cost Time
p j Quality (reliability):Quality (reliability):
Working fine for the People Technologydesignated period on
the designated system.People Technology
Terminology & ScopeTerminology & ScopeFailures
TreatsFailuresFaultsErrors
The ability of a system to deliver service that can
The ability of a system to deliver service that can
Attributes
AvailabilityReliabilitySafetyC fid ti lit
service that can justifiably be trusted.service that can justifiably be trusted.
Dependability Attributes ConfidentialityIntegrityMaintainability
Dependability
MeansFault preventionFault toleranceFault removal
Models
Fault removalFault forecasting
Reliability Block DiagramFault Tree model
Fault Tree modelReliability Graph
Software ReliabilitySoftware Reliability
Software ReliabilitySoftware Reliability
ModelModel ProcessProcessConceptConcept
Single Single FailureFailureM d lM d l
Multiple Multiple FailureFailureM d lM d l
SRE SRE ProcessProcess
ReliabilityReliabilityAvailabilityAvailabilityFailure rateFailure rate ModelModel ModelModelFailure rateFailure rateMTTFMTTFFailure densityFailure density
ReliabilityReliabilityGrowthGrowthModelModel
Etc.Etc.
Software Software QualityQuality
Software Software QualityQuality
StandardsStandards& Models& Models
AssessmentAssessmenttechniquestechniques
Size & EffortSize & Effort
ISO 9126ISO 9126 PrePre--releasereleasePredict sizePredict sizePredict effortPredict effortPredict timePredict time PostPost--releasereleasePredict timePredict time
At The End …At The End … What is software quality? What affects software quality? What is software quality? What affects software quality? What is software reliability engineering (SRE)? Why SRE is important? How does it affect software quality?
Wh t th i f t th t ff t th li bilit f ft ? What are the main factors that affect the reliability of software? How can one determine what the size of the software will be? How can one determine how much it will cost to develop the software? How can one determine how often will the software fail? How can one determine the current quality of the software under
development? How can one determine whether the software is reliable enough to be
released? Can SRE methodology be applied to the current ways of software
d l t lik bj t i t d t b d d ildevelopment: like object-oriented, component-based and agile development?
What are challenges and difficulties of applying SRE?
Question to AskQuestion to AskD I ll d t t k thi ? Do I really need to take this course?
Answer depend on you! Take this course if you want to avoid these in your
career as a software designer, tester and quality t llcontroller:
Bug Fix
MoralMoralI d t i l i t d Industrial oriented course Novelty of problem; multiplicity of solution
Attend lectures: there is more to lectures than to the notesthan to the notes
Sit close-up: some of the material is hard to see from the back
Feel free to ask questions.
Section Section 11
From SoftwareFrom Software Quality toQuality toSoftware Reliability Software Reliability EngineeringEngineering
What is Quality?What is Quality?Q i iQ i i Quality popular view:Quality popular view:
– Something “good” but not ifi blquantifiable
– Something luxury and classy
Quality professional view:Quality professional view:– Conformance to requirement
(Crosby, 1979)– Fitness for use (Juran, 1970)
SENG521 [email protected] 13
Quality: Various ViewsQuality: Various Views
Aesthetic Aesthetic ViewView
Developer Developer ViewViewViewView ViewView
Customer Customer ViewView
What is Software Quality?What is Software Quality?C f t i tC f t i t Conformance to requirementConformance to requirement The requirements are clearly stated and the
product must conform to itWhat isWhat isWhat isWhat is Any deviation from the requirements is
regarded as a defect A good quality product contains fewer defectsWhatWhat
What isWhat isSpecified?Specified?
What isWhat isSpecified?Specified?
A good quality product contains fewer defects Fitness for useFitness for use
Fit to user expectations: meet user’s needs
SW SW Does?Does? WhatWhat
useruserN d ?N d ?
WhatWhatuseruser
N d ?N d ? A good quality product provides better user
satisfactionNeeds?Needs?Needs?Needs?
BothBoth Dependable computing systemDependable computing systemBothBoth Dependable computing systemDependable computing system
SENG521 [email protected] 15
Both Both Dependable computing systemDependable computing systemBoth Both Dependable computing systemDependable computing system
Definition: Software QualityDefinition: Software QualityISO 8402 d fi itiISO 8402 definition
of QUALITY:The totality ofThe totality of features and characteristics of acharacteristics of a product or a service that bear on its ability to satisfy stated or implied needsneeds
ReliabilityReliability and MaintainabilityMaintainability are two major components of Quality
SENG521 [email protected] 16
Quality Model: ISO 9126Quality Model: ISO 9126
Characteristics Characteristics AttributesAttributes
1. Functionality Suitability Interoperability Accuracy
Compliance Security
2. Reliability Maturity Recoverability Fault tolerance
C h fCrash frequency
3. Usability Understandability Learnability Operability
4. Efficiency Time behaviour Resource behavioury
5. Maintainability Analyzability Stability Changeability
Testability
6. Portability Adaptability Installability Conformance
Replacability
Quality Model Quality Model –– StructureStructure
SW QualitySW QualityUser oriented
Quality Factor 1Quality Factor 1
Quality Factor 2Quality Factor 2
Quality Factor nQuality Factor n......
Quality Quality Quality Quality Quality Quality Quality Quality yCriterion
1
yCriterion
1
yCriterion
m
yCriterion
m......
yCriterion
2
yCriterion
2
yCriterion
3
yCriterion
3Software oriented
MeasuresMeasures
Example: Attribute ExpansionExample: Attribute ExpansionD i b blD i b bl Quality objective Design by measurable Design by measurable objectives:objectives:Incremental design is
Quality objective
Incremental design is evaluated to check whether the goal for each increment
Availability User friendliness
the goal for each increment was achieved.
% of planned Days on job to% of planned System uptime
Days on job to learn task suppliedBy new system
Worst: 95%Best: 99%
Worst: 7 daysBest: 1 day
Quality vs. Project CostsQuality vs. Project CostsC t di t ib ti f t i l ft j t
IntegrationProductDesign
Cost distribution for a typical software project
and test
Programming
Release
Programming
Design Programming Testing
What is wrong with this picture?
SENG521 [email protected] 20
What is wrong with this picture?
Total Cost DistributionTotal Cost DistributionMaintenance is responsible for more that 60% of total cost
Product Design
Maintenance is responsible for more that 60% of total cost for a typical software project
Questions:Questions:g
Programming
1) How 1) How to to build qualitybuild qualityProgramming build quality build quality into a system?into a system?
Integrationand test
Maintenance 2) How 2) How to to assess quality assess quality of a system?of a system?and test
Developing better quality system will contribute to lowering maintenance costs
of a system?of a system?
SENG521 [email protected] 21
contribute to lowering maintenance costs
1) How to Build Quality into a 1) How to Build Quality into a System?System?D l i b tt lit t i Developing better quality systems requires:
Establishing Quality Assurance (QA) Quality Assurance (QA) programs
Establishing Reliability Engineering (SRE)Reliability Engineering (SRE)process
SENG521 [email protected] 22
2) How to Assess Quality of a 2) How to Assess Quality of a System?System?
l b h Relevant to both pre-release and post-
lQuality
release Pre-release: SRE,
Q yAssessment
certification, standards ISO9001
Post-release: evaluation, validation, , ,RAM
SENG521 [email protected] 23
How Do We Assess Quality?How Do We Assess Quality?AA ( i( i AdAd--hoc (trial hoc (trial and error) and error)
h!h!approach!approach!
Systematic Systematic approachapproachpppp
SENG521 [email protected] 24
PrePre--release Qualityrelease QualityS ft Software inspection and
Facts:• About 20% of the software
projects are canceled. (missedtesting
Methods:
projects are canceled. (missed schedules, etc.)
• About 84% of software projects Methods:
SREare incomplete when released (need patch, etc).
• Almost all of the software projects Certification Standards
ost a o t e so twa e p ojectscosts exceed initial estimations. (cost overrun)
ISO9001, 9126, 25000
SENG521 [email protected] 25
Fatal Software ExamplesFatal Software Examples
Fatal software related incidents [Gage & McCormick 2004]Date Casualties Detail
2003 3 S ft f il t ib t f t N th t2003 3 Software failure contributes of power outage across North-eastern U.S. and Canada.
2001 5 Panamanian cancer patients die following overdoses of radiation, determined by the use of faulty softwaredetermined by the use of faulty software.
2000 4 Crash of marine corps osprey tilt-rotor aircraft, partially blamed on software anomaly.
d h ld h d j h h bbl d b1997 225 Radar that could have prevented Korean jet crash hobbled by software problem.
1995 159 American airlines jet, descending into Cali, Columbia crashes into t i A th t th ft t d i ffi i ta mountain. A cause was that the software presented insufficient
and conflicting information to the pilots, who got lost.
1991 28 Software problem prevents Patriot missile battery from picking up SCUD missile which hits US Army barracks in Saudi Arabia
SCUD missile, which hits US Army barracks in Saudi Arabia.
Cost of a Defect …Cost of a Defect …Require-ments
FieldUseDesign Functional
TestSystem
TestCoding
50 % Fault
Fault D t ti
40 %10 %
50 %
50 %
Origin
Detection10 %
25 %50 %
3 % 5 % 7 %
20 KDM
Cost per Fault
6 KDM
12 KDM
1 KDM 1 KDM 1 KDM
Fault
1 KDM 1 000 D t h M k
1 KDM = 1,000 Deutsch MarksCMU. Software Engineering Institute
A Central QuestionA Central Question
In spite of having many development methodologies, central questions are:g q
1 C ll b b f l ?1. Can we remove all bugs before release?2. How often will the software fail?f f f
Two ExtremesTwo ExtremesC ft SE f t h b Craftsman SE: fast, cheap, buggy
Cleanroom SE: slow, expensive, zero defect Is there a middle solution?
CraftsmanCraftsmanSoftware
CleanroomCleanroomSoftwareIs there
YES!
U i S ftDevelop-ment
Develop-ment
Is there a middle solution?
Using Software Reliability
EngineeringEngineering (SRE) Process
Can We Remove All Bugs?Can We Remove All Bugs?
Size [function points]
Failure potential [development]
Failure removal rate Failure Density [at release]
1 1.85 95% 0.09 10 2.45 92% 0.20
100 3 68 90% 0 37100 3.68 90% 0.37 1000 5.00 85% 0.75
10000 7.60 78% 1.67 100000 9.55 75% 2.39 Average 5.02 86% 0.91
Defect potential and density are expressed in terms of defects per function point
The answer is usually NO!The answer is usually NO!
The answer is usually NO!The answer is usually NO!
What Can We Learn from Failures?What Can We Learn from Failures?Time Between Failure vs. ith Failure
900
1000
Does this plot make
700
800
Does this plot make any sense to you?
400
500
600
Hou
rs
200
300
0
100
1 11 21 31 41 51 61 71 81 91
ith F il
ith Failure Failure Time
How to Handle Defects?How to Handle Defects?T bl b l i th ti b t f il Table below gives the time between failures for a software system:
Failure no. 1 2 3 4 5 6 7 8 9 10Time since last failure (hours) 6 4 8 5 6 9 11 14 16 19
What can we learn from this data?S t li bilit ? System reliability?
Approximate number of bugs in the system? Approximate time to remove remaining bugs?
What to Learn from Data?What to Learn from Data?Th i f th i t f il ti th The inverses of the inter-failure times are the failure intensity (= failure per unit of time) d idata points
Error no. 1 2 3 4 5 6 7 8 9 10
Time since last failure (hours)
6 4 8 5 6 9 11 14 16 19
Failure intensity 0.166 0.25 0.125 0.20 0.166 0.111 0.09 0.071 0.062 0.053
What to Learn from Data?What to Learn from Data?M ti t f il MTTF ( f il t ) Mean-time-to-failures MTTF (or average failure rate)MTTF = (6+4+8+5+6+9+11+14+16+19)/10 = 9.8 hour
System reliability for 1 hour of operation System reliability for 1 hour of operation1
9.8 0.90299tt MTTFR e e e
Fitting a straight line to the graph in (a) would show an x-intercept of about 15. Using this as an estimate of the total number of original failures, we estimate that there are still five bugs in the software.Fitting a straight line to the graph in (b) would give an x Fitting a straight line to the graph in (b) would give an x-intercept near 160. This would give an additional testing time of 62 units to remove all bugs, approximately.
A Typical Problem: QuestionA Typical Problem: QuestionF il i t it (f il t ) f t i ll Failure intensity (failure rate) of a system is usually expressed using FIT (Failure-In-Time) unit which is 1 failure per 10**9 device hours1 failure per 10**9 device hours.
Failure intensity of an electric pump system used for pumping crude oil in Northern Alberta’s oil fieldfor pumping crude oil in Northern Alberta s oil field is constant and is 10,000 FITs and 100 such pumps are operational.are operational.
If for continuous operation all failed units are to be replaced immediately, what shall be the minimumreplaced immediately, what shall be the minimum inventory size of pumps for one year of operation?
A Typical Problem: AnswerA Typical Problem: AnswerP ’ M Ti T F il (MTTF)Pump’s Mean-Time-To-Failure (MTTF) λ = 10,000 FITs = 10,000 / 10**9 hour = 1×10**-5 hour
= 1 failure per 100 000 hours= 1 failure per 100,000 hours
The 12-month reliability is: (1 year = 8 760 hours)The 12 month reliability is: (1 year 8,760 hours) R(8,760 hours) = exp{-8,760/100,000} = 0.916 and “unreliability” is, F(8,760) = 1 - 0.916 = 0.084
Therefore, inventory size is 8.4% or minimum 9 pumps should be at stock in the first year.
Another Typical ProblemAnother Typical Problem
Unit manufacturing cost of a software product is 50$. The company decides to offer p p yone year free update to its customers. Suppose that failure intensity of the productSuppose that failure intensity of the product at the release time is = 0.01 failures/month.What should be the unit cost of the productWhat should be the unit cost of the product including warranty services?
TerminologyTerminologyFailures
TreatsFailuresFaultsErrors
The ability of a system to avoid failures that are more frequent or more severe, and outage
The ability of a system to avoid failures that are more frequent or more severe, and outage
Attributes
AvailabilityReliabilitySafetyC fid ti lit
o o e se e e, a d outagedurations that are longer, than is acceptable to the users.
o o e se e e, a d outagedurations that are longer, than is acceptable to the users.
Dependability Attributes ConfidentialityIntegrityMaintainability
The ability of aThe ability of a
Dependability
MeansFault preventionFault toleranceFault removal
The ability of a system to deliver service that can justifiably be
The ability of a system to deliver service that can justifiably be
Models
Fault removalFault forecasting
justifiably be trusted.justifiably be trusted.
Reliability Block DiagramFault Tree model
Fault Tree modelReliability Graph
Dependability: TreatsDependability: Treats
Error cause Fault cause Failure
An error is a human action that results in software containing a fault.
A fault (bug) is a cause for either a failure of the ( g)program or an internal error (e.g., an incorrect state, incorrect timing). It must be detected and removed.
Among the 3 factors only failure is observable.
Definition: FailureDefinition: FailureF ilF il Failure: Failure: A system failure is an event that occurs when the delivered service
deviates from correct service. A failure is thus a transition from correct service to incorrect service i e to not implementing thecorrect service to incorrect service, i.e., to not implementing the system function.
Any departure of system behavior in execution from user needs. A failure is caused by a fault and the cause of a fault is usually a humanfailure is caused by a fault and the cause of a fault is usually a human error.
Failure Mode: Failure Mode: The manner in which a fault occurs, i.e., the way in which the element
faults. Failure Effect: Failure Effect:
The consequence(s) of a failure mode on an operation, function, status of a system/process/activity/environment. The undesirable outcome of a fault of a system element in a particular mode associated Risk
a fault of a system element in a particular mode. associated Risk
Failure Intensity & DensityFailure Intensity & Density
Failure Intensity (failure rate):Failure Intensity (failure rate): the rate failures are happening, i.e., number of failures per natural or time unit. Failure intensity is way of expressing system reliability, e.g., 5 failures per hour; 2 failures per 1000 transactions. For system
end users
Failure Density:Failure Density: failure per KLOC (or per FP) of developed code, e.g., 1 failure per KLOC, 0.2 failure
FP tper FP, etc.For system developers
Example: Failure DensityExample: Failure Densityf In a software system,
measuring number of f il l dfailures lead to identification of 5
d lmodules. However, measuring
failures per KLOC (Failure Density) leads to identification of only one module.
Example from Fenton’s Book
Failure Density Failure Density vs. Inspection vs. Inspection Effort Effort
f il d i d i f f Is failure density a good metrics for software quality?
The more bugs found and fixed doesn’t necessarily imply better software quality because the fault injection rate and the effort to fix them may be different.
Inspection EffortInspection EffortInspection EffortInspection EffortHigher Lower
Higher Good/ Not Worstyy Higher Good/ Not bad
Worst Case
Lower Best Case UnsureFailu
reFa
ilure
Den
sity
Den
sity
Dependability: Attributes /1Dependability: Attributes /1A il bilit di f t i Availability: readiness for correct service
Reliability: continuity of correct service Safety: absence of catastrophic consequences on
the users and the environment Confidentiality: absence of unauthorized
disclosure of information Integrity: absence of improper system state
alterations Maintainability: ability to undergo repairs and
modifications
Dependability: Attributes /2Dependability: Attributes /2Dependability attributes may be emphasized to a Dependability attributes may be emphasized to a greater or lesser extent depending on the application: availability is always required, whereas pp y y q ,reliability, confidentiality, safety may or may not be required. O h d d bili ib b d fi d Other dependability attributes can be defined as combinations or specializations of the six basic attributesattributes.
Example: Security is the concurrent existence of Availability for authorized users only;Availability for authorized users only; Confidentiality; and Integrity with improper taken as meaning unauthorized.
Definition: AvailabilityDefinition: Availability
Availability:Availability: a measure of the delivery of correct service with respect to the alternation pof correct and incorrect service
DowntineUptimeUptimetyAvailabili
p
MTTFMTTFtyAvailabili MTBFMTTRMTTF
y
Definition: Reliability /1Definition: Reliability /1R li bilit i f th ti d li f t Reliability is a measure of the continuous delivery of correct service
Reliability is the probability that a system or a capability of a Reliability is the probability that a system or a capability of a system functions without failure for a “specified time” or “number of natural units” in a specified environment. (Musa, t l ) Gi th t th t f ti i l t thet al.) Given that the system was functioning properly at the
beginning of the time period Probability of failure-free operation for a specified time in a Probability of failure free operation for a specified time in a
specified environment for a given purpose (Sommerville) A recent survey of software consumers revealed that
reliability was the most important quality attribute of the application software
Definition: Reliability /2Definition: Reliability /2h k iThree key points:
Reliability depends on how the software is used Therefore a model of usage is required
Reliability can be improved over time if certain Reliability can be improved over time if certain bugs are fixed (reliability growth) Therefore a trend model (aggregation or regression)Therefore a trend model (aggregation or regression) is neededF il h d i Failures may happen at random timeTherefore a probabilistic model of failure is needed
Dependability: MeansDependability: MeansF lt ti h t t th Fault prevention: how to prevent the occurrence or introduction of faults
Fault tolerance: how to deliver correct service in the presence of faultsp
Fault removal: how to reduce the number or severity of faultsseverity of faults
Fault forecasting: how to estimate the present number the future incidence and thepresent number, the future incidence, and the likely consequences of faults
Fault PreventionFault Prevention
Activities: Requirement reviewq Design review
Clear code Clear code Establishing standards (ISO 9000-3, etc.) Using CASE tools with built-in check mechanisms
All these activities are included in ISO 9000All these activities are included in ISO 9000--3: 3: Guidelines for application of ISO 9001 to the development, supply and maintenance of
ft (1991)
software (1991)
Definition: Definition: Fault ToleranceFault ToleranceA f lt t l t ti t i bl f A fault-tolerant computing system is capable of providing specified services in the presence of a bounded number of failuresbounded number of failures
Use of techniques to enable continued delivery of service during system operationservice during system operation
It is generally implemented by error detection and subsequent system recoverysubsequent system recovery
Based on the principle of:Act during operation while Act during operation while
Defined during specification and design
SRE: Process /1SRE: Process /1Th 5 t i There are 5 steps in SRE process (for each system toeach system to test): Define necessary Define necessary
reliability Develop
operational profiles Prepare for test
E Execute test Apply failure data
to guide decisions
to guide decisions
SRE: Process /2SRE: Process /2
Modified version of the SRE Process
Ref: Musa’s book 2nd Ed
ConclusionsConclusions
Software Reliability Engineering (SRE) can offer metrics and measures to help elevate a psoftware development organization to the upper levels of software developmentupper levels of software development maturity.H i i ff i However, in practice effective implementation of SRE is a non-trivial task!
SENG521 [email protected] 67