sim5102 software evaluation data collection. what is good data? correctness - data collected...

SIM5102 Software Evaluation

Data Collection

What is good data?

• Correctness - data collected according to the exact rules of definition of the metric

• Accuracy - difference between the data and the actual value

• Precision - number of decimal places needed to express the data

• Consistent - no much difference in value when measured using different device or by different person, or repeatedly

What is good data?

• Time-stamp - associate with a particular activity or time period to know when it was collected.

• Replication - replicate for difference circumstances.

How to define the data?

• Metric terminology must be clear and detailed, understood by all involved.

• Two kinds of data• Raw data• Refined data

• Direct and indirect measurement• Most organizations are interested in

software quality, cost and schedule.

The problem with problems

• Fig 5.2 (Software Quality Terminology)• Fault • occurs when a human error results in a

mistake in some software product• Eg: developer misunderstand the user-

interface requirement, and create a design with the wrong understanding, the design fault results in incorrect code.


• Failure• Departure of a system from its required behavior• Can be discovered during testing and operation

• Can think of fault and failure as inside and outside views of the system.

• Faults - problems that developers see • Failures - problems that the users see• Not every fault corresponds to a


• Reliability of software is defined in terms of failures observed during operation not faults.

• Terminology is very important here. Refer to page 157 (error, anomalies, defects, bugs, crashes)

• If investigation of failure reveal the fault, change is made to the product to remove it.


• Key elements of a problem:• Location - where did it occur• Timing - when did it occur• Symptom - what was observed• End result - which consequences resulted• Mechanism - how did it occur• Cause - why did it occur• Severity - how much was the user affected• Cost - how much did it cost


• These 8 attributes are mutually independent (applies only to initial measurement).

• Is known as orthogonality• Orthogonality also refers to a classification

scheme within a particular category.• Non-orthogonality can lead to data loss or

corruption, Eg 5.1


• The 8 attributes should suffice for all types of problems, but are answered differently based on whether it is faults, failures or changes.

Failures

• Focus on external problems of the system: installation, chain of events, effect on user or other system, cost.

• Fig 5.3 (failure report), page 160• Location - code that uniquely identifies the

installation and platform on which the failure was observed.

• Timing - real time of occurrence, execution time up to the occurrence of failure.

Failures

• Cause • type of trigger, type of source• Often cross-referenced to fault and change reports.

• Severity • how serious the failure’s end result• Classification for safety-critical system:

• Catastrophic• Critical• Significant• Minor

• Can also be measured in terms of cost

Failures

• Cost - how much effort and other resources to diagnose and response to the failure

• Ninth category; Count - number of failures in a stated time interval.

• Mechanism, cause and cost can only be completed after diagnosis.

• Data collection form for failure should include at least five categories.

Failure Report• Location: such as installation where failure was observed• Timing: CPU time, clock time or some temporal measure• Symptom: type of error message or indication of failure• End result: description of failure, such as “operating system crash”,

“ services degraded”, “loss of data”, “wrong output”, “no output”• Mechanism: chain of events, including keyboard commands and

state data, leading to failure• Cause: reference to possible fault(s) leading to failure• Severity: reference to a well-defined scale, such as “critical”,

“major”, “minor”• Cost: cost to fix plus cost of lost potential business

Faults

• Focus on the internals of system• Only the developer can see it• Location • which product(identifier and version) or part of the

product contains the fault.• Spec, code, database, manuals, plan and procedures,

report, standard/policy• Requirements, functional, preliminary design, detailed

design, product design, interface, database, implementation.

Faults

• Timing - when the fault is created, detected, corrected.

• Symptom - what is observed during diagnosis (Table 5.2, page 166)

• End result - actual failure caused by fault, should be cross-referenced to failure report.

• Cause• human error that led to the fault• Communication, conceptual, clerical

Faults

• Severity - impact on the user• Cost - total cost to system provider• Count - number of faults found in a

product or subsystem or during a given period of operation.

Fault Report

• Location: module or document name• Timing: phases of development during which fault was

created, detected, and corrected• Symptom: type of error message reported or activity

which revealed fault• End result: failure cause by the fault• Mechanism: how cause was created, detected,

corrected• Cause: type of human error that led to fault• Severity: refer to severity of resulting or potential failure • Cost: time or effort to locate and correct

Changes

• Once a failure is experienced and its cause determined, the problem is fixed through one or more changes

• Problem is fixed through 1 or more change to any or all of the development product

• Changes report (Fig 5.5) - report the changes and track the most affected products

Changes

• Cause of change may be• Corrective - correcting a fault• Adaptive - system changes in some way so the

product have to be upgraded• Preventive - find faults before they become failure• Perfective - redo something to clarify the system

structure

• Count - number of changes made in a given time or given system component.

Change Report

• Location: identifier of document or module changed• Timing: when change was made• Symptom: type of change• End result: success of change, as evidence by

regression or other testing• Mechanism: how and whom change was performed• Cause: corrective, adaptive, preventive or perfective• Severity: impact on rest of the system• Cost: time and effort for change implementation and test

How to collect data

• Requires human observation and reporting

• Manual data collection – bias, error, omission and delay -> uniform data collection form

• Automatic data capture – desirable, essential – recording the execution time

• To ensure data are accurate and complete, planning is essential.

How to collect data

• Planning involves:• Decide what to measure based on GQM

analysis.• Determine the level of granularity (individual

modules, subsystem, function etc)• Ensure that product is under configuration

control (which version)• Form design

How to collect data

• Planning involves (continue)• Establish procedures for handling the forms,

analyzing the data and reporting the results, setting a central collection point

How to collect data

• Keep procedure simple

• Avoid unnecessary recording

• Train staff in recording data and procedure

• Provide results to original providers promptly

• Validate all data collected at a central collection point

Data collection forms

• Encourages collecting good, useful data• Self-explanatory• Should allow fixed-format data and free-

format comments• Table 5.3 (Data collection forms for

software reliability evaluation)

When to collect data

• Data collection planning when project planning begins

• Actual data collection takes place during many phases of development.

• Can be collected at the beginning of the project to establish initial values and collected again later to reflect activities and resources being studied.


• Data-collection activities should become part of the regular development process.

• Should be mapped to the process model. (eg: Fig 5.9)


• Eg:• Data relating to project personnel (qualifications or

experience) can be collected at the start of the project. • While other data collection, such as effort, begins at

project start and continues through operation and maintenance

• Count of the number of specification and design faults can be collected as inspections are performed

• Data about changes made to enhance the product can be collected as enhancements are performed

How to store and extract data

• Raw data should be stored in database set up using DBMS.

• Can define data structure, insert, modify, delete and extract refined data.

• Formats, ranges, valid values etc can be check automatically as they are input.

How to store and extract data

• Raw database structure• Eg: Figure 5.10 (data structure that supports

reliability measurement)• box - table, arrow denotes a many-to-one

mapping, double arrow

sim5102 software evaluation data collection. what is good data? correctness - data collected...

Documents

failures problems

terms of cost slide

good data

faults problems

failures cost

data loss

data consistent

correctness data