lecture 11 test and fault tolerance ingo sander [email protected]

52
Lecture 11 Test and Fault Tolerance Ingo Sander [email protected]

Upload: philomena-lambert

Post on 01-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Lecture 11Test and Fault Tolerance

Ingo Sander

[email protected]

Test of Embedded Systems

IL2206 Embedded Systems 3

Program design and analysis

Verification costs are a significant part of the overall design costs

For large design the share of the verification costs can be up to 50% of the total design costs

Simulation and Test are the predominating verification method in industry

… but there is a large interest from industry to incorporate formal methods into the verification flow

April 20, 2023

IL2206 Embedded Systems 4

Goals

Make sure software works as intended. We will concentrate on functional testing---

performance testing is harder. What tests are required to adequately test the

program? What is “adequate”?

It is almost never practically possible to test the full software, since a program is so complex © 2000 Wolf (Morgan Kaufman)

April 20, 2023

IL2206 Embedded Systems 5

Test Environment

Provide the program with inputs Execute the program Compare the outputs to expected results

Test Environment

System under Test

Inpu

ts

Out

puts

April 20, 2023

IL2206 Embedded Systems 6

Types of software testing

Black-box: tests are generated without knowledge of program internals.

Clear-box (white-box): tests are generated from the program structure.

© 2000 Wolf (Morgan Kaufman)

April 20, 2023

IL2206 Embedded Systems 7

Clear-box testing

Generate tests based on the structure of the program. Is a given block of code executed when we think

it should be executed? Does a variable receive the value we think it

should get?

© 2000 Wolf (Morgan Kaufman)

April 20, 2023

IL2206 Embedded Systems 8

Controllability and observability

Controllability: must be able to cause a particular internal condition to occur.

Observability: must be able to see the effects of a state from the outside.

© 2000 Wolf (Morgan Kaufman)

April 20, 2023

IL2206 Embedded Systems 9

Example: FIR filter

Code:for (firout = 0.0, j =0; j < N; j++)

firout += buff[j] * c[j];

if (firout > 100.0) firout = 100.0;

if (firout < -100.0) firout = -100.0;

Controllability: to test range checks for firout, must first load circular buffer.

Observability: how do we observe values of buff, firout? © 2000 Wolf (Morgan Kaufman)

April 20, 2023

Example: FIR-Filter

How do we observe correct operation?1. Set the system into a

defined state Input k-1 0’s Input 1

2. Observe output Expected Output: ck

3. Input k-1 0’s Expected Outputs: ck-1,

ck-2, …, c1

IL2206 Embedded Systems 10

D

*

+

D

*

D

+

*

D

+

*

”Tap”

xk xk-1 x2 x1

ck ck-1 c2 c1

yk

yk= ckxk + ck-1xk-1 + ... + c1x1

April 20, 2023

IL2206 Embedded Systems 11

Path-based testing

Clear-box testing generally tests selected program paths: control program to exercise a path; observe program to determine if path was

properly executed. May look at whether location on path was

reached (control), whether variable on path was set (data).

© 2000 Wolf (Morgan Kaufman)

April 20, 2023

IL2206 Embedded Systems 12

Example: choosing paths

Two possible criteria for selecting a set of paths: Execute every statement at least once. Execute every direction of a branch at least once.

Equivalent for structured programs, but not for programs with gotos.

© 2000 Wolf (Morgan Kaufman)

April 20, 2023

IL2206 Embedded Systems 13

Path example

Covers allstatements

+/+ Covers allbranches

© 2000 Wolf (Morgan Kaufman)

April 20, 2023

IL2206 Embedded Systems 14

Branch testing strategy

Exercise the elements of a conditional, not just one true and one false case.

Devise a test for every simple condition in a Boolean expression.

© 2000 Wolf (Morgan Kaufman)

April 20, 2023

IL2206 Embedded Systems 15

Example: branch testing

Meant to write:if (a || (b >= c)) { printf(“OK\n”); }

Actually wrote:if (a && (b >= c)) { printf(“OK\n”); }

Branch testing strategy: One test is a=F, (b >= c) = T: a=0, b=3, c=2. Produces different answers.

© 2000 Wolf (Morgan Kaufman)

April 20, 2023

IL2206 Embedded Systems 16

Another branch testing example

Meant to write:if ((x == good_pointer) && (x->field1 == 3))...

Actually wrote:if ((x = good_pointer) && (x->field1 == 3))...

Branch testing strategy: If we use only field1 value to exercise branch, we

may miss pointer problem.

© 2000 Wolf (Morgan Kaufman)

April 20, 2023

Tools for Code Coverage

Tools exist to analyze to what extent the code is executed

‘gcov’, which is part of ‘gcc’ is a tool to measure code coverage

Tool coverage tools can significantly improve the tests of Embedded Software, since it becomes obvious, which parts of the code are never executed during a test!

IL2206 Embedded Systems 17April 20, 2023

Code Coverage Tool gcov

Example Code

int main (void) {

int i;

for (i = 1; i < 10; i++)

{

if (i % 3 == 0)

printf ("%d can be divided by 3\n", i);

if (i % 11 == 0)

printf ("%d can be divided by 11\n", i);

}

return 0;

}

IL2206 Embedded Systems 18April 20, 2023

Running gcov

> gcc -fprofile-arcs -ftest-coverage gcov.c

> a.out

3 can be divided by 3

6 can be divided by 3

9 can be divided by 3

> gcov gcov.c

File 'gcov.c'

Lines executed:85.71% of 7

gcov.c:creating 'gcov.c.gcov'

IL2206 Embedded Systems 19April 20, 2023

gcov Output

-: 1:#include <stdio.h>

-: 2:

1: 3:int main (void) {

-: 4: int i;

10: 5: for (i = 1; i < 10; i++)

-: 6: {

9: 7: if (i % 3 == 0)

3: 8: printf ("%d can be divided by 3\n", i);

9: 9: if (i % 11 == 0)

#####: 10: printf ("%d can be divided by 11\n", i);

-: 11: }

1: 12: return 0;

-: 13:}

IL2206 Embedded Systems 20April 20, 2023

IL2206 Embedded Systems 21

Data flow testing

Def-use analysis: match variable definitions (assignments) and uses.

Example:

x = 5;

if (x > 0) ... Does assignment get to the use?

def

use

© 2000 Wolf (Morgan Kaufman)

April 20, 2023

IL2206 Embedded Systems 22

Black-box testing

Black-box tests are made from the specifications, not the code.

Black-box testing complements clear-box. May test unusual cases better.

© 2000 Wolf (Morgan Kaufman)

April 20, 2023

IL2206 Embedded Systems 23

Types of black-box tests

Specified inputs/outputs: select inputs from spec, determine required

outputs. Random:

generate random tests, determine appropriate output.

Regression: tests used in previous versions of system.

© 2000 Wolf (Morgan Kaufman)

April 20, 2023

IL2206 Embedded Systems 24

Evaluating tests

It is very important to evaluate your tests Keep track of bugs found Introduce a new test procedure for every found

bug Error injection: add bugs to copy of code, run

tests on modified code. Error injection can be used to measure fault

coverage

April 20, 2023

IL2206 Embedded Systems 25

Formal Verification

An alternative to test is formal verification Example Model Checking

A formal model of a system is created State machine

Properties that systems shall specify are specified in a formal way It should never happen that both traffic lights signal “Green”

Tool checks that all properties are fulfilled for all input and state combinations, otherwise counter example is generated

Only small systems can be verified (state explosion problem)

April 20, 2023

IL2206 Embedded Systems 26

Summary

Test and verification are very important for in embedded system design

Good tests have to be planned Difficult to cover all test cases Tests should be evaluated in order to allow

possible improvements Formal verification is a very promising

alternative for critical parts!

April 20, 2023

Fault Tolerance

Ref: E. McCluskey and S. Mitra, “Fault Tolerance”

IL2206 Embedded Systems 28

Fault Tolerance

Fault Tolerance is the ability of a system to continue correct operation after the occurrence of hardware or software failures or operator errors

Fault tolerance includes detection of system malfunction identification of faulty units recovery of system from failure

April 20, 2023

IL2206 Embedded Systems 29

Reliability Requirements

Reliability requirements vary for different kinds of embedded systems low-cost systems shall operate for a reasonable

time and may then fail (calculator, cell phone) repair is often uneconomical

safety-critical systems must have a very high reliability (nuclear power plants, automotive control) probability of error in aircraft computer system is less

than 10-9 per hour

April 20, 2023

IL2206 Embedded Systems 30

Failures

Any deviation from expected behaviour is a failure

Failures that cause system to stop or crash are much easier to detect than failures that degrade system performance occasionally

April 20, 2023

IL2206 Embedded Systems 31

Failures

A permanent failure is a failure that is always present incorrect hardware or software functions

A temporary failure is a failure that is not always present during operation transient failures (externally induced signal

perturbation, power-supply disturbances) intermittent failure (weak system component

produces incorrect outputs under certain operating conditions)

April 20, 2023

IL2206 Embedded Systems 32

Source for Failures

Incorrect or incomplete specification interfaces not clearly defined

Incorrect design (bugs) memory allocation management of data structures communication between processes

Non-careful verification process not all possible scenarios are tested or verified

April 20, 2023

IL2206 Embedded Systems 33

Error

An error is the occurrence when incorrect data or control signals are produced

If a failure occurs in a system it may cause an error not cause an error, if the failure does not affect

system operation

April 20, 2023

IL2206 Embedded Systems 34

Fault Model

A fault model represents the effect of a failure by means of the change produced in the system signals

The usefulness of a fault model can be judged by Effectiveness in failure detection Accuracy of the representation of effects of

failures Tractability of design tools that use fault model

April 20, 2023

IL2206 Embedded Systems 35

ExampleSingle Stuck-at Fault Model

Single Stuck-at fault model is used to test hardware circuits

Very efficient in detection of defect chips Used to determine a minimal set of test

vectors Properties

Assumes single fault One signal in the system is stuck at value 0 or 1

Failure is observed at output

April 20, 2023

IL2206 Embedded Systems 36

ExampleSingle Stuck-at Fault Model

Which test vectors are needed to test an AND-gate according to the single stuck-at-model?

AND

A

B

Y

April 20, 2023

IL2206 Embedded Systems 37

ExampleSingle Stuck-at Fault Model

Six faults are possible s-a-0(A), s-a-1(A), s-a-0(B), s-a-1(B), s-a-0(Y), s-a-1(Y)

AND

A

B

Y

1

s-a-1(A): stuck-at-1 fault in A

April 20, 2023

IL2206 Embedded Systems 38

Three test vectors (ABY = {010, 100, 111}) needed.

Reduction with 25%!

ExampleSingle Stuck-at Fault Model

Faults can only be observed at output!

AND

A

B

Y

A A B B Y Y

A B Y s-a-0 s-a-1 s-a-0 s-a-1 s-a-0 s-a-1

0 0 0 x

0 1 0 x x

1 0 0 x x

1 1 1 x x x

x = Fault can be observed at output!

April 20, 2023

IL2206 Embedded Systems 39

Fault Models

Single stuck-at-model has been very successful in hardware design

More complicated fault models exist Difficult to develop fault models for software

no consensus about the effectiveness of software fault models

April 20, 2023

IL2206 Embedded Systems 40

Reliability Metrics

There are many different metrics for reliability depending on the character of the system Reliability of a system at time t is the probability

that system will produce correct output up to time t

Availability of a system at time t is the probability that the system is operational at time t

April 20, 2023

IL2206 Embedded Systems 41

Reliability Metrics

Safety of a system at time t is the probability that the system either will be operating correctly or will fail in a “safe” manner

Performability of a system at time t is the probability that the system is operating correctly or at a reduced throughput greater or equal a given value

Maintainability M(t) is the probability that it takes t units of time to restore a failed system to normal operation

April 20, 2023

IL2206 Embedded Systems 42

Metrics for Testability

There exist even measures for testability, which is the ease with which the system can be tested difficult to quantify important factors

test pattern generation cost test application cost observability of state information controllability – production of an internal signal

April 20, 2023

IL2206 Embedded Systems 43

Measurement of reliability

Test of a large number of components N At time t

G(t) is number of correctly operating components F(t) is number of components that have failed

Reliability R(t) = G(t)/N

April 20, 2023

IL2206 Embedded Systems 44

Measurement of reliability

There are important other metrics that are related to the presented reliability metrics Mean Time To Failure (MTTF) Mean Time To Repair (MTTR)

Together these metrics can be used to calculate other metrics Average Availability: MTTF / (MTTF + MTTR) Mean Time Between Failures (MTBF)

MTBF = MTTF + MTTR

April 20, 2023

IL2206 Embedded Systems 45

Bathtub Curve

For hardware systems the bathtub curve illustrates the reliability of typical systems

April 20, 2023

IL2206 Embedded Systems 46

Fault Avoidance

Reliability can be improved during the design process robust design techniques design validation techniques reliability verification techniques thorough production techniques

Fault avoidance techniques are very costly

April 20, 2023

IL2206 Embedded Systems 47

System Failure Response

System can respond in different ways to a failure Error on output – Acceptable in non-critical applications

digital watch, games Errors masked – Outputs correct even when fault occurs

flight control Fault secure – Output correct or error indication if output

incorrect banking, telephony, networking

Fail safe – Output correct or at “safe value” “red” light for traffic control

April 20, 2023

IL2206 Embedded Systems 48

Error Masking

Triple modular redundancy Critical component is tripled Additional majority voting logic

R

R

R

2/3

April 20, 2023

IL2206 Embedded Systems 49

Error Masking

Triple modular redundancy Critical component is tripled Additional majority voting logic

R

R

R

2/3

April 20, 2023

IL2206 Embedded Systems 50

Error MaskingSoftware Techniques

N-Version Programming Several versions of program are written

independently Voting is used

Recovery Blocks Several versions of program are written

independently Only one program is run and monitored If error is detected an alternate program is run

April 20, 2023

IL2206 Embedded Systems 51

Repair Techniques

When there is a failure in a system the failure must be detected and isolated

Built-In Self-Test: Additional functionality tests if system operates correctly and identifies faulty parts

system must respond to the error system must be repaired

self-repair techniques (space missions) exact diagnosis for fault and report to maintenance

personal

April 20, 2023

IL2206 Embedded Systems 52

Summary

Reliability comes with a cost Redundancy is required (extra components

that are not needed for the pure functionality) There is a trade-off between reliability and

design costs Often a very low probability for a fault-free

system is good enough!

April 20, 2023