revised ieee 1633 recommended practices for software reliability

Advantages of IEEE 1633 Recommend

Practices for Software Reliability

Chair: Ann Marie Neufelder, SoftRel, LLC

Vice Chair: Martha Wetherholt, NASA

Secretary: Debra Haehn, Philips

IEEE Standards Association Chair: Louis Gullo, Raytheon Missile Systems

Division

1

Software reliability timeline

1960’s 1970’s 1980’s 1990’s

1962 First

recorded

system

failure due to

software

Many software reliability growth estimation models

developed. Limitation– can’t be used until late in

testing.

1968

The term

“software

reliability” is

invented.

First predictive model

developed by USAF Rome

Air Development Center

with SAIC and Research

Triangle Park –

Limitations– model only

useful for aircraft and

never updated after

1992.

SoftRel, LLC

models based

on RL model.

Can be used

on any system.

Updated every

4 years.

2000’s

2

Martin

Trachtenberg

notices the

“bell curve”

Larry Putnam/QSM quantifies the bell curve

used for both scheduling and staffing

Introduction and motivation• Software reliability engineering

• Has existed for over 50 years.

• Fundamental prerequisite for virtually all modern systems

• Rich body of software reliability research generated over last several decades, but…

• Practical guidance on how to apply these models has lagged significantly

• Diverse set of stakeholders requires pragmatic guidance and tools to apply software reliability models to assess real software or firmware projects during each stage of the software development lifecycle

• Reliability engineers may lack software development experience

• Software engineers may be unfamiliar with methods to predict software reliability

• Both may have challenges acquiring data needed for the analyses

3

Abstract• Newly revised IEEE 1633 Recommended Practice for Software

Reliability provides actionable step by step procedures for employing

software reliability models and analyses

• During any phase of software or firmware development

• With any software lifecycle model for any industry or application

type.

• Includes

• Easy to use models for predicting software reliability early in

development and during test and operation.

• Methods to analyze software failure modes and include software in

a system fault tree analysis.

• Ability to assess the reliability of COTS, FOSS, and contractor or

subcontractor delivered software.

• This presentation will cover the key features of the IEEE 1633

Recommended Practices for software reliability.

• Current status of this document - Approved by IEEE Standards

Association Ballot of May 24, 2016

4

Acknowledgement of IEEE 1633 Working

Group members• Lance Fiondella

• Peter Lakey

• Robert Binder

• Michael Siok

• Ming Li

• Ying Shi

• Nematollah Bidokhti

• Thierry Wandji

• Michael Grottke

• Andy Long

• George Stark

• Allen Nikora

• Bakul Banerjee

• Debra Greenhalgh Lubas

• Mark Sims

• Rajesh Murthy

• Willie Fitzpatrick

• Mark Ofori-kyei

5

• Sonya Davis

• Burdette Joyner

• Marty Shooman

• Andrew Mack

• Loren Garroway

• Kevin Mattos

• Kevin Frye

• Claire Jones

• Robert Raygan

• Mary Ann DeCicco

• Shane Smith

• Franklin Marotta

• David Bernreuther

• Martin Wayne

• Nathan Herbert

• Richard E Gibbs III

• Harry White

• Jacob Axman

• Ahlia T. Kitwana

• Yuan Wei

• Darwin Heiser

• Brian McQuillan

• Kishor Trivedi

Chair: Ann Marie

Neufelder, SoftRel, LLC

Vice Chair: Martha

Wetherholt, NASA

Secretary: Debra

Haehn, Philips

IEEE Standards

Association Chair: Louis

Gullo, Raytheon Missile

Systems Division

IEEE 1633 Working Group

• Defense/aerospace contractors – 11 members

• Commercial engineering – 9 members

• US Army – 6 members

• US Navy – 5 members

• Academia – 4 members

• DoD – 3 members

• NASA – 3 members

• Medical equipment – 2 members

• Software Engineering Institute – 1 member

• Nuclear Regulatory Commission – 1 member

6

Table of contentsSection Contents

1,2,3 Overview, definitions and acronyms

4 Tailoring guidance

5 “Actionable” Procedures with Checklists and Examples

5.1 Planning for software reliability.

5.2 Develop a failure modes mode

5.3 Apply SRE during development

5.4 Apply SRE during testing

5.5 Support Release decision

5.6 Apply SRE in operation

Annex A Supporting information on the software FMEA

Annex B Detailed procedures on predicting size and supporting information for the predictive

models

Annex C Supporting information for the software reliability growth models

Annex D Estimated cost of SRE

Annex E SRE tools

Annex F Examples

7

Section 4 SRE Tailoring

8

• The document is geared towards 4 different roles, any

industry and any type of software.

• Hence, section 4 provides guidance for tailoring the

document.

• By role – recommended sections if you are a reliability engineer,

software QA, software manager or acquisitions.

• By life cycle - How to apply the document if you have an

incremental or agile life cycle model.

• By criticality – Some SR tasks are essential, some are typical and

some are project specific.

Typical tasks by role

9

Section 5.1 Planning for software reliability

10

Planning• An often overlooked but essential step in SRE

11

Topic Description

Characterize the

software system

What are the Line Replaceable Units? (Applications,

executables, DLLs, COTS, FOSS, firmware, glueware)

Which are applicable for SRE?

What is the operational profile?

Define failures and

criticality

There is no one definition fits all. Failures need to be

defined relative to the system under development.

Perform a reliability

risk assessment

Determine a simple Red/Yellow/Green SRE risk. Use

that to determine the degree of SRE.

Assess the data

collection system

The available data and SRE tools will determine which

tasks are feasible

Review the available

SRE tools

Finalize the SRE plan The Software Reliability Program Plan can be part of

the Software Development Plan or the Reliability Plan

or a standalone document

Section 5.2 Develop Failure Modes Analysis

12

Section 5.2 Develop Failure Modes Analysis

13

• This section focuses on the 3 analyses that identify potential failure modes.

• Understanding the failure modes is essential for development, testing, and decision making. Real examples are included in the document.• Perform Defect Root Cause Analysis (RCA)• Perform Software Failure Modes Effects Analysis (SFMEA)

• Prepare the SFMEA• Analyze Failure Modes and Root Causes• Identify consequences• Mitigate• Generate a Critical Items List (CIL)• Understand the differences between a hardware FMEA

and a software FMEA• Include Software in the System Fault Tree Analysis

SFMEA and SFTA Viewpoints

These are complementary methods

14

Software defect root cause analysis• The RCA ensures that any SRE improvement efforts address the right

types of defects.

• Example, if most of the defects are introduced in the design phase, you don’t

want to put all of your SRE effort into improving coding practices.

• Software reliability assessment identified certain gaps and strengths which

can lead to a certain “volume” of defects

• But, a root cause analysis can confirm the “types” of defects

• Faulty requirements?

• Faulty design?

• Faulty implementation?

• Faulty interfaces?

• Faulty changes or corrective actions?

• Faulty source and version control?

• These can and will be unique for each configuration item even if they have

the same development processes

Copyright © SoftRel, LLC 2011. This presentation may not be copied in part or in whole without written permission from Ann Marie Neufelder.

Example of a root cause analysis

Defects are introduced because of either bad requirements, bad design, bad

coding practices or bad change control.

• Requirements defect – The “whats” are incorrect, ambiguous or incomplete.

• Design defect – The “whats” are correct but the “hows” are not. Logic, state, timing,

exception handling are all design related.

• Coding defect- The “whats” and “hows” are correct but the software engineer did not

implement one or more lines of code properly.

16Copyright © SoftRel, LLC 2011. This presentation may not be copied in part or in whole without written permission from Ann Marie Neufelder.

Section 5.3 Apply SRE during development

17

Section 5.3 Apply SRE during development Tasks Description

1. Determine/obtain system

reliability objectives in terms of

reliability, availability, MTBF

Today’s system are software intensive. This makes

it difficult to establish a reasonable system

objective. This document provides 3 approaches

for this.

2. Perform software reliability

assessment and prediction

See upcoming slides

3. Sanity check the early

prediction

One reason why SRE prediction models haven’t be

used is that reliability engineers are unsure of the

results. The document has typical reliability values

based on the size of the software.

4. Merge the predictions into

the over system prediction

Once the predictions are done, the reliability

engineer will want to integrate them into the overall

system RBD or fault tree. The document has

several methods for doing so.

5. Determine the total software

reliability needed to reach the

objective

Since software engineering is often managed

centrally, the software manager will want to know

what the software components as an aggregate

need to achieve.

18

Section 5.3 Apply SRE during development

6. Plan the reliability

growth needed to reach the

system objective

Once the software objective is established, plans

can and should be made to ensure that there is

sufficient reliability growth in the schedule.

Reliability growth can only happen if the software

is operated in a real environment with no new

feature drops.

7. Perform a sensitivity

analysis

Quite often there isn’t sufficient schedule for

extended reliability growth so a sensitivity

analysis is needed to determine how to cut the

defects to reach the objective.

8. Allocate the required

objective to each software

LRU

If the software components are managed by

different organizations or vendors, the software

level objective will need to be further allocated.

9. Employ software

reliability metrics

There are other metrics that can support decision

making, testing and delivery that also support

more reliable software.

19

Section 5.3.2 Perform software reliability assessment

and prediction

20

• Since the 1970s most of the software reliability models are usable only during test or operation when it’s too late to do planning, tradeoffs, improvements.

• The models presented in this section can be used for the code is even written. The predictions are then merged with the hardware reliability predictions and compared to the system reliability objective.

If you can predict this fault profile you can

predict all of the other reliability figures of merit

The predictive models predict the fault profile first and then then

failure rate, MTBF, reliability and availability is predicted from that

21

Section 5.4 Apply SRE during testing

22


1. Develop a reliability test

suite

Software reliability growth models are useless

unless the software is being exercised. The first

step is to make sure that it is.

2. Measure test coverage The models can’t measure what they don’t know.

The higher the test coverage, the higher the

confidence in the models.

3. Increase test effectiveness

via fault insertion

Many software reliability issues are due to the

software performing an unexpected function as

opposed to it failing to perform a required function.

This increases the confidence in the reliability.

4. Collect failure and defect

data

All of the models require the testing/operational

hours and either the time of each failure

observation or the total number of failures in a day.

5. Select and use reliability

growth models

Before you use any model, you need to plot the

failure data and see which models are applicable.

The document provides complete guidance on how

to do this.

23


6. Apply SRE metrics Certain metrics provide information about the

maturity of the software which are essential for

decision making and planning of resources.

7. Determine accuracy of the

models

The failure trend can change at any time during

testing. Hence, the best model can change with it.

The best way to measure accuracy is to compare

the estimations to the next time to failure.

8. Support release decision The release decision should not be made solely

based on the SRG models. The decision is based

on the test coverage and approach, degree of fault

insertion, other SRE metrics which can indicate

troubled releases as well as the SRG models.

24

Section 5.4 Apply SRE during testing

25

• Software reliability growth models have existed since the 1970s• Many of them provide nearly identical results • SWRG models have been difficult to implement and understand

due to poor guidance from academic community• Several models assume data which is not feasible to collect on

non-academic large software systemsThis document provides• Models that are feasible for real software systems• Consolidation of models that provide similar results• Step by step instructions for how to select the best model(s)

based on• The observed defect discovery trend (see next slide)• Inherent Defect Content• Effort required to use the model(s)• Availability of data required for the model(s)• How to apply them when you have an incremental life cycle• Test coverage methods which affect the accuracy of all SWRG models

Selecting the best SWRG model• Most important criteria is the current defect discovery trend.

• A few models can be used when the discovery rate is increasing or peaking.

Most can be used when decreasing or stabilizing.

• If the defect discovery is plotted first, the user will know which models can be

used

26

0

2

4

6

8

10

12

No

n C

um

ula

tive

de

fect

s d

isco

vere

d

Normalized usage period

Increasing

Peaking

Decreasing

Stabilizing

Section 5.5 Support Release Decision

27

Section 5.5 Support Release Decision

28

Once the development and testing is complete the SRE analyses, models and metrics can be used to determine whether a decision should be accepted• Decision is based on

• Requirements and Operational Profile coverage• Stress test coverage• Code coverage• Adequate defect removal• Confidence in reliability estimates

• SRE Tasks performed prior to acceptance• Determine Release Stability – do the reliability estimates meet the objective?• Forecast additional test duration – If the objective hasn’t been met how many more

test hours are required?• Forecast remaining defects and effort required to correct them – Will the forecasted

defects pile up? Impact the next release?• Perform a Reliability Demonstration Test – Determine statistically whether the software

meets the objective

Section 5.6 Apply SRE in Operations

29

Once the software is deployed the reliability should be monitored to assess any changes needed to previous analyses, predictions and estimations

Section 5.6 Apply SRE in OperationsTasks Description

1. Employ SRE metrics to

monitor software reliability

The best way to improve the accuracy of the

predictions and SWRG models is to measure the

actual software reliability once in operation.2. Compare operational and

predicted reliability

3. Assess changes to previous

characterizations or analyses

The operational failure modes may be different than

what’s visible in testing. If so, the software failure

modes analyses will need to focus on the

operational failure modes to improve the reliability

of the next release.

4. Archive operational data Operational data is valuable for future predictions,

sanity checking, etc.

30

Summary

• IEEE P1633 2016 puts forth recommended practices to

apply qualitative software failure modes analyses and

qualitative models

• Improve product and ensure software or firmware delivered with

required reliability

• IEEE P1633 2016 includes improved guidance

• Offers increased value more accessible to a broader audience

• Reliability engineers

• Software quality engineers

• Software managers

• Acquisitions

31