institut für computertechnik ict institute of computer technology safety critical computer systems...
Post on 19-Dec-2015
217 views
TRANSCRIPT
Institut fürComputertechnik
ICTInstitute ofComputer Technology
Safety Critical Computer Systems - Open Questions and
Approaches
Andreas GerstingerInstitute for Computer Technology
February 16, 2007
Institut für Computertechnik 2
Agenda
Safety-Critical Systems Project Partners Three research topics
Safety Engineering Diversity Software Metrics
Conclusion and Outlook
Institut für Computertechnik 4
Safety Critical Systems
A safety-critical computer system is a computer system whose failure may cause injury or death to human beings or the environment
Examples: Aircraft control system (fly-by-wire,...) Nuclear power station control system Control systems in cars (anti-lock brakes,...) Health systems (heart pacemakers,...) Railway control systems Communication systems Wireless Sensor Networks Applications?
Institut für Computertechnik 5
SYSARI Project
SYSARI = SYstem SAfety Research in Industry
Goal of the project to conduct and promote the research in system
safety engineering and safety-critical system design and development
Close cooperation between ICT and Industry One "shared" Employee (me) Students conducting practical Diploma Theses PhD Theses
Institut für Computertechnik 6
What is Safety?
“The avoidance of death, injury or poor health to customers, employees, contractors and the general
public; also avoidance of damage to property and the environment”
Safety is NOT an absolute quantity!Safety is NOT an absolute quantity!
Safety is also defined as "freedom from unacceptable risk of harm"
Safety is also defined as "freedom from unacceptable risk of harm"
A basic concept in System Safety Engineering is the avoidance of "hazards"
A basic concept in System Safety Engineering is the avoidance of "hazards"
Institut für Computertechnik 7
Safety vs. Security
These two concepts are often mixed up In German, there is just one term for both!
System
Safety= doesn’t cause harm
Security= protection against
attacks
Institut für Computertechnik 8
SILs and Dangerous Failure Probability
Safety Integrity Level
High demand mode of operation (Probability of dangerous failure per hour)
SIL 4 10-9 P < 10-8
SIL 3 10-8 P < 10-7
SIL 2 10-7 P < 10-6
SIL 1 10-6 P < 10-5
Institut für Computertechnik 10
Project Partner:
Austrian High Tech company World leader in air traffic
control communication systems
700 employees, company based in Vienna, customers all over the world
http://www.frequentis.com
Institut für Computertechnik 11
Frequentis Voice Communication System
Enables communication between aircraft and controller
Communication link must never fail! Requirements:
Safety High Availability and Reliability Fault Tolerance
Other domains: railway ambulance, police, fire brigade,... maritime
Safety Integrity Level 2
Institut für Computertechnik 12
Project Partner:
French company 68000 employees worldwide Mission critical information
systems 25000 researchers Nobel Prize in Physics 2007
awarded to Albert Fert, scientific director of Thales research lab
http://www.thalesgroup.com
Institut für Computertechnik 13
Railway Signalling Systems
Signalling and Switching Axle Counters Applications for ETCS
An incorrect output may lead to an incorrect signal causing a major accident!
Safety Integrity Level 4 (highest)
Institut für Computertechnik 15
Signal Box / Interlocking Tower
Electric system with some electronics
Institut für Computertechnik 16
Modern Signal Box / Interlocking Tower
Lots of electronics and computer systems
Institut für Computertechnik 18
What is a Hazard? Hazard
physical condition of platform that threatens the safety of personnel or the platform, i.e. can lead to an accident
a condition of the platform that, unless mitigated, can develop into an accident through a sequence of normal events and actions
"an accident waiting to happen"
Examples oil spilled on staircase failed train detection system at an automatic railway
level crossing loss of thrust control on a jet engine loss of communication distorted communication undetectably incorrect output
Institut für Computertechnik 19
Hazard Severity Level (Example)
Category Id.
Definition
CATASTROPHIC I General: A hazard, which may cause death, system loss, or severe property or environmental damage.
CRITICAL II General: A hazard, which may cause severe injury, major system, property or environmental damage.
MARGINAL III General: A hazard, which may cause marginal injury, marginal system, property or environmental damage.
NEGLIGIBLE IV General: A hazard, which does not cause injury, system, property or environmental damage.
Institut für Computertechnik 20
Hazard Probability Level (Example)
LevelProbability [h-
1]Definition
Occurrences per year
Frequent P ≥ 10-3 may occur several times a month
More than 10
Probable 10-3 > P ≥ 10-4 likely to occur once a year
1 to 10
Occasional 10-4 > P ≥ 10-5 likely to occur in the life of the system
10-1 to 1
Remote 10-5 > P ≥ 10-6
unlikely but possible to occur in the life of the system
10-2 to 10-1
Improbable
10-6 > P ≥ 10-7 very unlikely to occur 10-3 to 10-2
Incredible P < 10-7
extremely unlikely, if not inconceivable to occur
Less than 10-
3
Institut für Computertechnik 21
Risk Classification Scheme (Example)
Hazard Severity
Hazard Probability
CATASTROPHIC CRITICAL MARGINAL NEGLIGIBLE
Frequent A A A B
Probable A A B C
Occasional A B C C
Remote B C C D
Improbable C C D D
Incredible C D D D
Institut für Computertechnik 22
Risk Class Definition (Example)
Risk Class Interpretation
A Intolerable
BUndesirable and shall only be accepted when risk reduction is impracticable.
CTolerable with the endorsement of the authority.
DTolerable with the endorsement of the normal project reviews.
Institut für Computertechnik 23
Having identified the level of risk for the product we must determine how acceptable & tolerable that risk is Regulator / Customer Society Operators
Decision criteria for risk acceptance / rejection Absolute vs. relative risk (compare with previous,
background) Risk-cost trade-offs Risk-benefit of technological options
Risk Acceptability
Institut für Computertechnik 24
Risk Tolerability
Hazard
Severity Probability
Risk
Risk Criteria
Tolerable?No
Risk Reduction MeasuresYes
Institut für Computertechnik 26
Diversity
Goal: Fault Tolerance/Detection Diversity is "a means of achieving all or
part of the specified requirements in more than one independent and dissimilar manner."
Can tolerate/detect a wide range of faults"The most certain and effectual check upon errors which arise in the process of computation, is to cause the same computations to be made by separate and independent computers; and this check is rendered still more decisive if they make their computations by different methods."
Dionysius Lardner, 1834
Institut für Computertechnik 27
Layers of Diversity
Concept of Operation(e.g. specifications)
Realisation(e.g. object code)
Implementation(e.g. source code)
Design(e.g. design descriptions)
HW(CPU, memory,...)
abstraction
e.g. two different paradigms, such as rule based and functional
Diversity Examples
e.g. n version design
e.g. n version coding
e.g. diverse compilers
e.g. diverse CPU
Institut für Computertechnik 28
Examples for Diversity
Specification Diversity Design Diversity Data Diversity Time Diversity Hardware Diversity Compiler Diversity Automated Systematic Diversity Testing Diversity Diverse Safety Arguments …
Some faults to be targeted:
programming bugs, specification faults, compiler faults, CPU faults, random hardware faults (e.g. bit flips), security attacks,...
Institut für Computertechnik 29
Compiler Diversity
Use of two diverse compilers to compile one common source code
...Module A{ int i; int end; get(end); for i = 1 to end result=func(i,result); POS[i]=result; next}...
Compiler A
Compiler B
...move $4, Ajmp $54256add ($5436), B...
...add ($66533), Aret move $4, C...
Common Source Code
Diverse Compiler - different manufacturer - different version - different compiler options
Diverse Object Code (?)
Institut für Computertechnik 30
Compiler Diversity: Issues
Targeted Faults: Systematic compiler faults Some Heisenbugs Some systematic and permanent hardware
faults (if executed on one board) Issues:
To some degree possible with one compiler and different compile options (optimization on/off,…)
If compilers from different manufacturers are taken, independence must be ensured
Institut für Computertechnik 31
Systematic Automatic Diversity
Artificial introduction of diversity to tolerate HW Faults
(Automatic) Transformation of program P to a semantically equivalent program P' which uses the HW differently e.g. different memory areas, different
registers, different comparisons,...
if A=B then if A-B = 0 thenA or B not (not A and not B)
Institut für Computertechnik 32
Systematic Automatic Diversity
What can be "diversified": memory usage execution sequence statement structures array references data coding register usage addressing modes pointers mathematical and logic rules
Institut für Computertechnik 33
Systematic Automatic Diversity: Issues
Targeted Faults: Systematic hardware faults Permanent random hardware faults
Issues: Can be performed on source code or assembler
level If performed on source code level, it must be
ensured that compiler does not "cancel out" diversity
(Software) Fault injection experiments showed an improvement of a factor ~100 regarding HW faults
Institut für Computertechnik 34
Example: Diverse Calculation of Position
Position P can be calculated based on speedometer and accelerometer readings
Voter can also be implemented diversely
PositionA and PositionB could be transmitted in different formats
Determine Position from Speedometer
Determine Position from Accelerometer
Voter A:if PA=PB then send PA
else RaiseException
Voter B:if PA-PB=0 then send
PB
else RaiseException
PA
PBPB
PA
Speedometer Accelerometer
PositionBPositionA
Institut für Computertechnik 35
Open Issues
How can diversity be used most efficiently? Can diversity be introduced automatically? Which faults are detected/tolerated to
which extent? How can the quality fo the diversity be
measured? Can diversity be also used to detect
security intrusions?
Institut für Computertechnik 37
Software Metrics for Safety-Critical Systems
Problems Which metrics should
safety-critical software fulfill?
Which coding rules are good and useful?
What are the desired ranges for metrics?
Which metrics influence maintainability?
if P then if Q then S1 else S2 if R then S3 else S4else S5
Sx (block) statementsP, Q, R(boolean) predicates
Institut für Computertechnik 38
Some RAW Metrics... P1 P2 P3 P4 P5 P6 Firefox
(Main) Language C# C# Java Java Java C++ C/C++
Functions 1321 11383 1344 2997 1383 3863 102630
Classes 101 2170 119 413 225 455 8979
LOCs 34731 287279 21098 48650 23567 95289 2640688
eLOCs 25077 204737 16775 40182 19624 74774 2187030
LOC/Function 26.29 25.24 15.70 16.23 17.04 24.67 25.73
LOC/Class 343.87 132.39 177.29 117.80 104.74 209.43 294.10
eLOC/Function 18.98 17.99 12.48 13.41 14.19 19.36 21.31
eLOC/Class 248.29 94.35 140.97 97.29 87.22 164.34 243.57
Max CC 135 213 58 281 43 222 751
Avg CC 3.36 2.62 2.83 3.23 2.67 2.87 4.28
CC >10 51 323 60 162 51 154 8802
CC >50 4 13 2 4 0 9 478
CC >10 [%] 3.86 2.84 4.46 5.41 3.69 3.99 8.58
CC >50 [%] 0.30 0.11 0.15 0.13 0.00 0.23 0.47
Notices/KLOC 50.24 57.33 118.12 143.70 100.90 68.10 112.84
SevereNotices/KLOC 4.26 6.02 18.06 22.32 14.68 15.26 34.48
Institut für Computertechnik 39
Outline of Method
1. Create a questionnaire with relevant questions regarding software quality and get answers from expert developers for various software packages they work with
2. Automatically measure potentially interesting metrics of the software packages
3. Correlate questionnaire responses with the measured metrics to find out which metric correlates with which property
Institut für Computertechnik 40
Graph 3: Code Clarity vs. Return Points
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1 1.5 2 2.5 3 3.5 4 4.5 5
Code Clarity (1-best, 5-worst)
Ave
rag
e n
um
ber
or
retu
rn p
oin
ts
Institut für Computertechnik 41
Graph 4: Internal Quality vs. CC
0
1
2
3
4
5
6
1 1.5 2 2.5 3 3.5 4 4.5 5
General Internal Quality (1-best, 5-worst)
Ave
rag
e C
yclo
mat
ic C
om
ple
xity
Institut für Computertechnik 42
Summary of Results
Strongest correlation with perceived internal quality: Comment density Control Flow Anomalies
No correlation with perceived internal quality: Cyclomatic Complexity Average Method Size Average File Size ...
Institut für Computertechnik 44
Further Related Topics
Agile Methods in Safety Critical Development Hazard Analysis Methods Safety Standards Safety of Operating Systems COTS Components for Safety-Critical Systems Safety Aspects of Modern Programming
Languages (Java, C#.NET) Fault Detection, Correction and Tolerance Safety and Security Harmonisation Linux in Safety-Critical Environments Online Tests to detect hardware faults
Institut für Computertechnik 45
Conclusion
Many open issues in this field... All research activities in SYSARI project
practically motivated Number of safety-critical systems increases International Standards play a vital role
(e.g. IEC 61508)
Contact:Andreas Gerstinger: [email protected]