institut für computertechnik ict institute of computer technology safety critical computer systems...

45
Institut für Computertechnik ICT Institute of Computer Technology Safety Critical Computer Systems - Open Questions and Approaches Andreas Gerstinger Institute for Computer Technology February 16, 2007

Post on 19-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Institut fürComputertechnik

ICTInstitute ofComputer Technology

Safety Critical Computer Systems - Open Questions and

Approaches

Andreas GerstingerInstitute for Computer Technology

February 16, 2007

Institut für Computertechnik 2

Agenda

Safety-Critical Systems Project Partners Three research topics

Safety Engineering Diversity Software Metrics

Conclusion and Outlook

Institut für Computertechnik

Safety-Critical Systems

Institut für Computertechnik 4

Safety Critical Systems

A safety-critical computer system is a computer system whose failure may cause injury or death to human beings or the environment

Examples: Aircraft control system (fly-by-wire,...) Nuclear power station control system Control systems in cars (anti-lock brakes,...) Health systems (heart pacemakers,...) Railway control systems Communication systems Wireless Sensor Networks Applications?

Institut für Computertechnik 5

SYSARI Project

SYSARI = SYstem SAfety Research in Industry

Goal of the project to conduct and promote the research in system

safety engineering and safety-critical system design and development

Close cooperation between ICT and Industry One "shared" Employee (me) Students conducting practical Diploma Theses PhD Theses

Institut für Computertechnik 6

What is Safety?

“The avoidance of death, injury or poor health to customers, employees, contractors and the general

public; also avoidance of damage to property and the environment”

Safety is NOT an absolute quantity!Safety is NOT an absolute quantity!

Safety is also defined as "freedom from unacceptable risk of harm"

Safety is also defined as "freedom from unacceptable risk of harm"

A basic concept in System Safety Engineering is the avoidance of "hazards"

A basic concept in System Safety Engineering is the avoidance of "hazards"

Institut für Computertechnik 7

Safety vs. Security

These two concepts are often mixed up In German, there is just one term for both!

System

Safety= doesn’t cause harm

Security= protection against

attacks

Institut für Computertechnik 8

SILs and Dangerous Failure Probability

Safety Integrity Level

High demand mode of operation (Probability of dangerous failure per hour)

SIL 4 10-9 P < 10-8

SIL 3 10-8 P < 10-7

SIL 2 10-7 P < 10-6

SIL 1 10-6 P < 10-5

Institut für Computertechnik

Project Partners

Institut für Computertechnik 10

Project Partner:

Austrian High Tech company World leader in air traffic

control communication systems

700 employees, company based in Vienna, customers all over the world

http://www.frequentis.com

Institut für Computertechnik 11

Frequentis Voice Communication System

Enables communication between aircraft and controller

Communication link must never fail! Requirements:

Safety High Availability and Reliability Fault Tolerance

Other domains: railway ambulance, police, fire brigade,... maritime

Safety Integrity Level 2

Institut für Computertechnik 12

Project Partner:

French company 68000 employees worldwide Mission critical information

systems 25000 researchers Nobel Prize in Physics 2007

awarded to Albert Fert, scientific director of Thales research lab

http://www.thalesgroup.com

Institut für Computertechnik 13

Railway Signalling Systems

Signalling and Switching Axle Counters Applications for ETCS

An incorrect output may lead to an incorrect signal causing a major accident!

Safety Integrity Level 4 (highest)

Institut für Computertechnik 14

(Old) Interlocking Systems

Mechanical / Electromechanical

Systems

Institut für Computertechnik 15

Signal Box / Interlocking Tower

Electric system with some electronics

Institut für Computertechnik 16

Modern Signal Box / Interlocking Tower

Lots of electronics and computer systems

Institut für Computertechnik

Safety Engineering

Institut für Computertechnik 18

What is a Hazard? Hazard

physical condition of platform that threatens the safety of personnel or the platform, i.e. can lead to an accident

a condition of the platform that, unless mitigated, can develop into an accident through a sequence of normal events and actions

"an accident waiting to happen"

Examples oil spilled on staircase failed train detection system at an automatic railway

level crossing loss of thrust control on a jet engine loss of communication distorted communication undetectably incorrect output

Institut für Computertechnik 19

Hazard Severity Level (Example)

Category Id.

Definition

CATASTROPHIC I General: A hazard, which may cause death, system loss, or severe property or environmental damage.

CRITICAL II General: A hazard, which may cause severe injury, major system, property or environmental damage.

MARGINAL III General: A hazard, which may cause marginal injury, marginal system, property or environmental damage.

NEGLIGIBLE IV General: A hazard, which does not cause injury, system, property or environmental damage.

Institut für Computertechnik 20

Hazard Probability Level (Example)

LevelProbability [h-

1]Definition

Occurrences per year

Frequent P ≥ 10-3 may occur several times a month

More than 10

Probable 10-3 > P ≥ 10-4 likely to occur once a year

1 to 10

Occasional 10-4 > P ≥ 10-5 likely to occur in the life of the system

10-1 to 1

Remote 10-5 > P ≥ 10-6

unlikely but possible to occur in the life of the system

10-2 to 10-1

Improbable

10-6 > P ≥ 10-7 very unlikely to occur 10-3 to 10-2

Incredible P < 10-7

extremely unlikely, if not inconceivable to occur

Less than 10-

3

Institut für Computertechnik 21

Risk Classification Scheme (Example)

Hazard Severity

Hazard Probability

CATASTROPHIC CRITICAL MARGINAL NEGLIGIBLE

Frequent A A A B

Probable A A B C

Occasional A B C C

Remote B C C D

Improbable C C D D

Incredible C D D D

Institut für Computertechnik 22

Risk Class Definition (Example)

Risk Class Interpretation

A Intolerable

BUndesirable and shall only be accepted when risk reduction is impracticable.

CTolerable with the endorsement of the authority.

DTolerable with the endorsement of the normal project reviews.

Institut für Computertechnik 23

Having identified the level of risk for the product we must determine how acceptable & tolerable that risk is Regulator / Customer Society Operators

Decision criteria for risk acceptance / rejection Absolute vs. relative risk (compare with previous,

background) Risk-cost trade-offs Risk-benefit of technological options

Risk Acceptability

Institut für Computertechnik 24

Risk Tolerability

Hazard

Severity Probability

Risk

Risk Criteria

Tolerable?No

Risk Reduction MeasuresYes

Institut für Computertechnik

Diversity

Institut für Computertechnik 26

Diversity

Goal: Fault Tolerance/Detection Diversity is "a means of achieving all or

part of the specified requirements in more than one independent and dissimilar manner."

Can tolerate/detect a wide range of faults"The most certain and effectual check upon errors which arise in the process of computation, is to cause the same computations to be made by separate and independent computers; and this check is rendered still more decisive if they make their computations by different methods."

Dionysius Lardner, 1834

Institut für Computertechnik 27

Layers of Diversity

Concept of Operation(e.g. specifications)

Realisation(e.g. object code)

Implementation(e.g. source code)

Design(e.g. design descriptions)

HW(CPU, memory,...)

abstraction

e.g. two different paradigms, such as rule based and functional

Diversity Examples

e.g. n version design

e.g. n version coding

e.g. diverse compilers

e.g. diverse CPU

Institut für Computertechnik 28

Examples for Diversity

Specification Diversity Design Diversity Data Diversity Time Diversity Hardware Diversity Compiler Diversity Automated Systematic Diversity Testing Diversity Diverse Safety Arguments …

Some faults to be targeted:

programming bugs, specification faults, compiler faults, CPU faults, random hardware faults (e.g. bit flips), security attacks,...

Institut für Computertechnik 29

Compiler Diversity

Use of two diverse compilers to compile one common source code

...Module A{ int i; int end; get(end); for i = 1 to end result=func(i,result); POS[i]=result; next}...

Compiler A

Compiler B

...move $4, Ajmp $54256add ($5436), B...

...add ($66533), Aret move $4, C...

Common Source Code

Diverse Compiler - different manufacturer - different version - different compiler options

Diverse Object Code (?)

Institut für Computertechnik 30

Compiler Diversity: Issues

Targeted Faults: Systematic compiler faults Some Heisenbugs Some systematic and permanent hardware

faults (if executed on one board) Issues:

To some degree possible with one compiler and different compile options (optimization on/off,…)

If compilers from different manufacturers are taken, independence must be ensured

Institut für Computertechnik 31

Systematic Automatic Diversity

Artificial introduction of diversity to tolerate HW Faults

(Automatic) Transformation of program P to a semantically equivalent program P' which uses the HW differently e.g. different memory areas, different

registers, different comparisons,...

if A=B then if A-B = 0 thenA or B not (not A and not B)

Institut für Computertechnik 32

Systematic Automatic Diversity

What can be "diversified": memory usage execution sequence statement structures array references data coding register usage addressing modes pointers mathematical and logic rules

Institut für Computertechnik 33

Systematic Automatic Diversity: Issues

Targeted Faults: Systematic hardware faults Permanent random hardware faults

Issues: Can be performed on source code or assembler

level If performed on source code level, it must be

ensured that compiler does not "cancel out" diversity

(Software) Fault injection experiments showed an improvement of a factor ~100 regarding HW faults

Institut für Computertechnik 34

Example: Diverse Calculation of Position

Position P can be calculated based on speedometer and accelerometer readings

Voter can also be implemented diversely

PositionA and PositionB could be transmitted in different formats

Determine Position from Speedometer

Determine Position from Accelerometer

Voter A:if PA=PB then send PA

else RaiseException

Voter B:if PA-PB=0 then send

PB

else RaiseException

PA

PBPB

PA

Speedometer Accelerometer

PositionBPositionA

Institut für Computertechnik 35

Open Issues

How can diversity be used most efficiently? Can diversity be introduced automatically? Which faults are detected/tolerated to

which extent? How can the quality fo the diversity be

measured? Can diversity be also used to detect

security intrusions?

Institut für Computertechnik

Software Metrics

Institut für Computertechnik 37

Software Metrics for Safety-Critical Systems

Problems Which metrics should

safety-critical software fulfill?

Which coding rules are good and useful?

What are the desired ranges for metrics?

Which metrics influence maintainability?

if P then if Q then S1 else S2 if R then S3 else S4else S5

Sx (block) statementsP, Q, R(boolean) predicates

Institut für Computertechnik 38

Some RAW Metrics...  P1 P2 P3 P4 P5 P6 Firefox

(Main) Language C# C# Java Java Java C++ C/C++

Functions 1321 11383 1344 2997 1383 3863 102630

Classes 101 2170 119 413 225 455 8979

LOCs 34731 287279 21098 48650 23567 95289 2640688

eLOCs 25077 204737 16775 40182 19624 74774 2187030

               

LOC/Function 26.29 25.24 15.70 16.23 17.04 24.67 25.73

LOC/Class 343.87 132.39 177.29 117.80 104.74 209.43 294.10

eLOC/Function 18.98 17.99 12.48 13.41 14.19 19.36 21.31

eLOC/Class 248.29 94.35 140.97 97.29 87.22 164.34 243.57

               

Max CC 135 213 58 281 43 222 751

Avg CC 3.36 2.62 2.83 3.23 2.67 2.87 4.28

CC >10 51 323 60 162 51 154 8802

CC >50 4 13 2 4 0 9 478

CC >10 [%] 3.86 2.84 4.46 5.41 3.69 3.99 8.58

CC >50 [%] 0.30 0.11 0.15 0.13 0.00 0.23 0.47

               

Notices/KLOC 50.24 57.33 118.12 143.70 100.90 68.10 112.84

SevereNotices/KLOC 4.26 6.02 18.06 22.32 14.68 15.26 34.48

Institut für Computertechnik 39

Outline of Method

1. Create a questionnaire with relevant questions regarding software quality and get answers from expert developers for various software packages they work with

2. Automatically measure potentially interesting metrics of the software packages

3. Correlate questionnaire responses with the measured metrics to find out which metric correlates with which property

Institut für Computertechnik 40

Graph 3: Code Clarity vs. Return Points

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1 1.5 2 2.5 3 3.5 4 4.5 5

Code Clarity (1-best, 5-worst)

Ave

rag

e n

um

ber

or

retu

rn p

oin

ts

Institut für Computertechnik 41

Graph 4: Internal Quality vs. CC

0

1

2

3

4

5

6

1 1.5 2 2.5 3 3.5 4 4.5 5

General Internal Quality (1-best, 5-worst)

Ave

rag

e C

yclo

mat

ic C

om

ple

xity

Institut für Computertechnik 42

Summary of Results

Strongest correlation with perceived internal quality: Comment density Control Flow Anomalies

No correlation with perceived internal quality: Cyclomatic Complexity Average Method Size Average File Size ...

Institut für Computertechnik

Conclusion and Outlook

Institut für Computertechnik 44

Further Related Topics

Agile Methods in Safety Critical Development Hazard Analysis Methods Safety Standards Safety of Operating Systems COTS Components for Safety-Critical Systems Safety Aspects of Modern Programming

Languages (Java, C#.NET) Fault Detection, Correction and Tolerance Safety and Security Harmonisation Linux in Safety-Critical Environments Online Tests to detect hardware faults

Institut für Computertechnik 45

Conclusion

Many open issues in this field... All research activities in SYSARI project

practically motivated Number of safety-critical systems increases International Standards play a vital role

(e.g. IEC 61508)

Contact:Andreas Gerstinger: [email protected]