quality & reliability in software engineering

Engineering Quality & Reliability

SivaramaSundar.D29th Nov 2012

Expectations

How quality can be achieved? - DoneHow we maintain quality? - DoneQuality measurements; - Overview provided;How reliability can be achieved? - Done

Software Quality

In simple terms “Quality software is reasonably bug-free, delivered on time and within budget, meets

requirements and/or expectations, and is maintainable”

More formally, Software quality measures how well software is designed (quality of design) and how well the software conforms to that design (i.e., Implementation -

quality of conformance / quality assurance)

So, we’d have quality issues …

1. if customer’s expectations are not met by the end result2. if there is a lack of conformance to requirement3. if the development criteria towards specified standards are not met4. if implicit requirements are not captured and addressed

(ex: - a change in the physician name in configuration should reflect in the case selection & acquisition screen- A Remote service connection has to be secure- A long text message or caption should have a “…” & a tooltip on mouse hover- Pressing the Escape key or the X button should close a pop-up and cancel the changes made)

Hence, both quality of design and quality assurance needs to be ensured throughout the SDLC, across all the phases

The V-Model product development process (and agile), ensures better quality assurance by preparing for testing early in each stage of SDLC.

The V Model involves the testers early in the project lifecycle and thus providing avenues to correct before critical decisions are made.

** The earlier a fault is found, the cheaper it is to fix…

Phase Measure

Requirements SSRS, URS cover Quality Attributes & implicit requirements (Performance, Security, Safety, Regulatory)Test Strategy & Plan, Adoption of specific Test Design Techniques based on Risk Matrix for each unitCritical to Quality Use casesRequirement WorkshopsReviews or Walkthroughs, ChecklistsTraceability Matrix for requirement conformancePrototypes

Design Design Guidelines, StandardsDesign WorkshopsDARReviews & Walkthroughs, Checklists

Implementation Unit testing, Mocks & StubsContinuous IntegrationReviews & Walkthroughs, Checklists

Testing Functional TestingIntegration TestingSmoke Testing

Cost of Non Quality!

ReliabilitySoftware reliability is defined as “the probability of failure-free software operation for a specified period of time in a

specified environment”.

Faults

Mistakes

Errors

Failures

Leads to zero or many

Person (developer) makeszero to many


Leads to zero or manyCan be attributed

to one or many

Can be attributedto one or many


Field Calls

Leads to zero or many Customer complaintsCan be attributed

to one or many

Faults

Mistakes

Errors

Failures


Person (developer) makeszero to many


Leads to zero or manyCan be attributed

to one or many



Field Calls

Leads to zero or many Customer complaintsCan be attributed

to one or many

Software reliability is based on the three primary concepts: fault, error, and failure (Bug in a program is a fault. Possible incorrect values caused by this bug is an error. Possible crash of the operating system is a failure.)

A fault is the result of a mistake made in the development of the system. Faults are dormant but they can become active due to some revealing mechanisms. (ex: a check for free disk space threshold before acquisition start, other ex: null ref., uninitialized variable leading to errors)

An error is the manifestation of what is wrong in the running system. Often errors lead to new errors (propagation), which eventually may lead to system failure. (ex: fault leading to full disk space usage, without a warning or validation)

An error can become a failure when it is not corrected or masked, i.e., when error become observable by the system’s user it become a failure (that is, failure is observable by the end user and error is not).(ex: full disk space leads to acquisition failure or data loss or system crash)

• Specification mistakes – incorrect algorithms, incorrectly specified requirements (timing, power, environmental)

• Implementation mistakes – poor design, software coding mistakes

• Component defects – manufacturing imperfections, random device defects, components wear-outs

• External factors – radiation, lightning, operator mistakes

Reliability … explained

Reliability – Key Areas

Area Description Applicable PhaseFault Prevention focuses on avoidance of faults in SW products Requirements, DesignFault Detection focuses on revealing reliability problems Requirements, Design,

Implementation, TestingFault Tolerance ensures that system is working properly in case of faults Design, Implementation

Fault Forecasting focuses on prediction of the future system reliability Deployment, Support & Service

Reliability in practice…Phase Measure

Requirements Reliability Requirements, Safety & Risk Management RequirementsCritical to Quality Use casesRequirement WorkshopsReviews, Walkthroughs, Checklists

Design Design Guidelines, StandardsEmphasis for threading, execution architectureWhiteboard designsDesign WorkshopsFMEAGraceful DegradationDARReviews, Walkthroughs

Implementation Follow Best Practices & Coding StandardsError HandlingTICSStatic Analysis, Code Coverage, Memory ProfilingReviews, Reports attached to CQ activityImproved LoggingPOST, BIST

Testing Performance TestingSmoke Testing based on CTQ’s & Operational ProfilesRegression Testing

Reliability requirements(MTBF, MTBC, MTBE, etc.)

Architecture/Design for reliability(principles, practices, and patterns)

Measuring and testing for reliability(Measuring: MTBF, MTBC, ..., tools

Testing: load/stress/capacity testing, reliability growth testing, tools)

NO

Releaseproduct

Is reliability req.fulfilled?

YES

Operational profile(which functionality is critical)

Requirements

Reliability in practice…Reliability Parameters Targets & Measurement Criteria

Call Rate Target:< 1.5 calls per system per year Actions: Implement I/O enhancements Recommendations: Start study to analyze how to decrease the call rate to < 1.0

Failure Rate (Failure Rate gives an indication of the number of non-recoverable failures in the field. )

Target: # of failures reported should be less than 10 per siteHave explicit robustness designed into the product Actions: Execute FMEAs

MTBF Mean time between failure would be 200 days or 1000 studies

MTTR Mean time to repair should not exceed 2 days

Usage of PII Private Interfaces # of private interfaces used – should be 0 ideally, and no increase in usage of new private interfaces

TICS Target: 0 violations for level 1..6, No increase of level 7..10 violations Actions: Monitor and act

Code Coverage (Method, Statement) > 80% Statement coverage & 100% Method Coverage

Code reviews 100 %

Identify faults(e.g., FMEA)

System Specification

Faults to be prevented by system architecture/design

Faults to be handled by the system

Class I

Fault preventiondesign

Fault tolerantdesign

Classify faultsFaults

(severity, frequency)

Faults not to be handled by the

system

System controlled-faildesign

Class III

Class II

System/service crasheswhen encounters a fault of class III

System/service does not experience faults of class I

System/service keeps Operating when encounters a fault of class II

Classify functionality according to

importance/criticality

High level design

Reliability – Design FMEAIdentify Critical Functionality & Classify for effective design towards handling faults

Reliability testingSteps Possible inputs/tools Hints1. Derive test cases from Operational profiles

Application specialistLogging from productionUse cases

Operational profiles may differ per deployment…Test cases derived from operational profiles differ from stress testing.

2. Run tests Manual testing on systemMock, stubs, driverssimulatorsQTPTest Automation Framework

Test cases should be as repeatable as possible and executed under same conditions.

3. Gather data System loggingTest logging

In case of automation, keep in mind time compression factor.Failure definition should be explicit for the tested system.

4. Plot data and extract failure intensity and failure rate5. Predict reliability at end of current project phase

Be aware that the predicted reliability will still very vulnerable to variances

Quality vs. Reliability

Quality is a snapshot at the start of life (Time Zero).All requirements are implemented and as per the design.All user expectations are met.Time zero defects are mistakes that escaped the final test. “Quality is everything until put into operation (0-hours)”

Reliability is a motion picture of the day-by-day operation.

The additional defects that appear over time are "reliability defects" or reliability fallout.“Reliability is everything happening after 0-hours”

quality & reliability in software engineering

Documents