software for calculation of complex safety …...software for calculation of complex safety...

Software for Calculation of complex safety Parameters for Systems in safety critical Applications

DANIEL TÖPEL, SARA HOSSEINI DINANI, LARISSA GAUS & JOSEF BÖRCSÖK

Department of Computer Architecture and System Programming University of Kassel

Wilhelmshöher Allee 71 GERMANY

[email protected], [email protected] Abstract: - Due to the continuous development of technical systems, the complexity of the software and hardware solutions increases systematically. As a consequence, the calculation of safety and reliability parameters gets constantly more advanced and sophisticated. For now, there is no comprehensive software, which covers all or most of the established standards of calculations. This paper presents the latest development of the SILCaS software, which enables the user to quickly calculate and document the safety and reliability parameters in a very convenient, graphical form. Key-Words: - Safety Parameter calculation, IEC 61508, Reliability, Safety, Availability

1 Introduction In recent years, the international market reports a very significant growth in demand for automation systems. Apart from causing financial damages, failures in such systems may also result in serious injuries or can even be dangerous to human’s life. According to [1] every project developing such a system is characterized by three principles: • safety, • functionality and • profitability Fig.1 shows the interaction between the three basic principles.

SafetyDevelopment

Function

Economy

Fig.1: Competing development goals [1] The development process of a safety-related system must balance all these three principles. Nevertheless, the highest priority should always be given to safety. Therefore, the safety of a system is an important characteristic, which must be analysed and enhanced during design. Safety related planning helps proving the safety level of a system’s safety or helps developing systems of a particular safety level.

The requirements for all stages of a system’s life cycle as well as its structure and functionality are defined in numerous standards. Beside other conditions, it is required that the safety parameters are verified by mathematical methods. The problem hereby is that the mathematic complexity increases proportionally to the system’s complexity. There are different approaches to model safety-critical systems. The manifold of methods constantly causes new calculation tools, however until now software covering all of the most important standards does not exist. The latest development in this field is SILCaS. This tool enables the user to calculate safety-functions according to different standards. The requirements for SILCaS are:

• functional requirements: o Presentation and calculation of

safety functions o composing a system according to

predefined rules • non-functional requirements:

o web-based application o user management

The tool also gives access to a range of already calculated data for a wide range of products. This data will be maintained by a neutral body instead of the manufacturer. The user will be able to quickly combine the predefined products to a safety function, which could help to shorten certification processes. Additionally the user may design – and experiment with – custom products and data.

Recent Advances in Circuits, Systems and Automatic Control

ISBN: 978-960-474-349-0 122

One of the advantages of the tool is that it avoids human errors in calculations which otherwise would occur due to the complexity of the formulas demanded by the standards. The tool tries to reduce the possibilities for the user to manipulate the data or the calculation. The tool is also suitable for users not familiar with safety related theories. If an internet connection is available, the user may access the database and use certified components. Otherwise he is restricted to locally stored or locally created data. The following figure shows the program structure. The database contains all certified components. Clients can connect via internet and launch their GUI.

Web-Server

+

Database

Client 1

Client 2

http

Client n

Architecture of SILCaS

.

.

.

Fig.2: Architecture of SILCaS This paper will present the basic functionalities of SILCaS and includes examples which show the process of creating a safety function. It will not compare the tool to other software.

2 V-Modell Based on the software-lifecycle, safety-related requirements have been included into the generally known V-Model, which now can be applied to software development according to IEC 61508-3. The V-Model is shown in Fig.3.

Fig.3: Diagram of a system development In this graphical illustration, the left (falling) branch represents the specification stage, which includes the concept and construction processes of a product. The right (rising) branch represents the realisation stage, when testing and integration of the product take place.

3 Safety parameters 3.1 Reliability According to [2] reliability is the ability of a component or a system to work correctly, for a given period of time and under certain circumstances. This definition does not yet have any quantitative approach, only a time related dependency is included. A more mathematical approach can be found in [3], where reliability is defined as the probability that a product works without failure during a given period of time, as long as the previously defined environmental and functional conditions are met. 3.2 Risk and Safety Any technical system can be seen as a potential source of damage. The definition of danger in [4] covers physical injuries or damage to the health as well as damage to goods or environment. Risk is defined as the product of probability and danger. During the design stage of a system is tried to reduce the risk caused by this system by appropriate concepts. This is necessary to provide the (required level of) safety, which is defined by [4] as the absence of unreasonable risks. Nevertheless it is impossible to


ISBN: 978-960-474-349-0 123

achieve 100% safety. There will always be a minimal risk left. The basic difference between safety and reliability is according to [5] the definition of failures. Whereas safety only takes those failures into account which may lead to damage, reliability broadens this view on all types of failures. 3.3 The failure rate The term “rate” should be understood as the average frequency per time unit. The failure rate value is a ratio of the number of failure in the time interval [t, ∆t] to the number of components remaining intact at this time:

t

tth

∆∆

= intervaltheincomponentsintactofNumber

intervaltimeincomponentsfaultyofNumber)(

Because the failure rate is an empirical parameter, it is important that its value can be specified only together with the assumed functional and operational conditions. The failure rate value can change depending on these conditions. In electronic and electrical engineering, the failure rate function has usually a typical course of a Weibull distribution [6]. It consists of three characteristic phases: 1. Phases of the early failures (also called “starting phase”): h(t) has a descending course. Malfunctions in this phase can be assigned to material weaknesses, quality fluctuations in the production or application errors, i.e. attributed to some "teething problems". 2. Service Life Phase: h(t) is approximately constant. In this phase, the failures have "purely random" nature. 3. Wear phase: h(t) increases. Failures in this phase are caused by aging, wear, fatigue etc. In most cases, the phase of early failure in industrial systems is artificially shortened [2]. As a consequence, the Service Life Phase achieves the system’s failure behaviour faster. With the reliability and safety analysis a constant failure rate can be accepted for this phase.

λ=)(th All further considerations and calculations in this paper are based on the assumption of a constant failure rate and therefore are limited to the Service Life Phase. In the field of functional safety, the failure rate λ constant is divided into two parts: • λD: Dangerous failure rate • λS: Safe failure rate Both types of failure rates can be divided into detected and undetected failures. Failure is called “detected”, when it is recognized after connecting

with hardware, by diagnostic tests, periodic testing, operator action or during normal operations [4]. Accordingly, undetected failures are those, that after connection with hardware, remain unrecognized, by diagnostic tests, periodic testing, operator action or during normal operations [4]. In order to distinguish these types, the default notation for failure rate includes a second index: “D” for detected and “U” for undetected failures. As a result, the failure rate λ consists of four components: λDD, λDU, λSD and λSU. This division is shown in Fig.4 3.4 MTTF One of the most basic and important parameters in the safety-related sector is the expectation value of the lifetime of a component or a system. This parameter is named MTTF (Mean time to Failure) und indicates the mean time between two failures.

Safe

Undetected

Safe

Detected

Dangerous

Undetected

Dangerous

Detected

λD = λDD + λDU

λS = λSD + λSU

Fig.4: Failure rate 3.5 PFH and PFD In safety technology it is required that the tolerable risk limit for a system must not be exceeded. The quantitative indication of the existing risk is managed by the determination of the probability of failure. In the field of functional safety [4] there is a distinction between two calculated values for the failure limit: • PFD as a probability of failure on demand for systems operating in low demand mode • PFH as a probability of a dangerous failure per hour for systems operating in high demand or continuous mode Low demand rate (low demand mode) is defined as the use while which the request takes place no more than once per year, or no more than twice within the proof test interval.


ISBN: 978-960-474-349-0 124

In other cases, the high demand rate (high demand operation mode) is considered. Here, the request takes place more than once per year, or the request rate is higher than double proof test frequency. 3.6 CCF Common-Cause-Failures mean the errors which occur simultaneously on all systems’ channels and have a common cause [2]. Common-Cause-Failure is a failure of all units of a system caused by a single event. In [8] the following sources for Common-Cause-Failures are mentioned: • Material properties or design defects that cause the same error in a module • Error during installation which cause the same failures in all components • Repair consequences • Environmental conditions such as vibration, humidity, radiation, etc. 3.7 Proof-Test and DC The concept of technical systems maintenance requires some measures to detect errors. The faulty technical system or component can be repaired only if the presence of a failure is detected. For the purpose of error detection, there are proof tests and diagnostic measures in the safety-critical applications provided. Proof test is a periodic test to detect failures in a safety-related system so that, if necessary, the system can be placed in an "as new" state [4]. The proof tests are performed always in equal time intervals (test intervals) which are identified by T1. Diagnostic tests are used to reveal the dangerous errors. It causes that the failure rate of dangerous undetectable errors is reduced. In this regard, the diagnostic coverage factor (DC) is used. 3.8 Hardware Fault Tolerance The composition of a MooN-system allows for some fault tolerance. Therefore it is common for a safety related system to be made of a MooN-architecture. The MooN-architecture may be seen as a parallel system of N channels of which M fault-free channels are required for the system to work[9]. The standard [4] defines fault tolerance generally as the ability of a functional unit to perform a function even in the presence of failures. HFT (hardware failure tolerance) is defined in [7] as the number of failures up to which, in any occurring combination,

the system will not stop working correctly. This means for MooN architectures:

mnHFTMooN −=

A 1oo2 system can tolerate one failure. This means, when one of the channels suffers a failure, the system will not stop working correctly. Only when the second channel also fails the system will stop working correctly. Therefore the HFT is:

121 =ooHFT

A 2oo4 System has a HFT of 2. The system can tolerate two faulty channels. A failure of the third channel will lead to the system’s failure:

242 =ooHFT

As there cannot be more failures than channels the HFT never can become negative.

4 Supported Standards Several standards characterize the quantitative measures to determine the risk potential. The standard which has to be chosen depends on the system and its purpose. SILCaS supports the following standards: • IEC 61508 • DIN EN ISO 13849 • IEC 62061 4.1 IEC 61508 The standard IEC/EN 61508, developed by the International Electrotechnical Commission serves as a base for other standards and defines the principles of the entire life cycle the of safety related systems. The severity of incidents is divided into four classes: • Catastrophic Several fatalities • Critical A single fatality, several severe injuries or crew’s illness • Marginal A single severe injury, crew’s illness, several minor injuries or minor crew’s diseases • Negligible Minor injuries or minor people’s diseases Four risk-classes are defined by this standard. They are named with the Roman numerals I to IV, whereas I means the most severe and IV the least severe risk. Table1 shows the relation between the risk class, the frequency and the severity of incidents. Nevertheless, the actual table may differ due to other applications or definitions.


ISBN: 978-960-474-349-0 125

Frequency /Severity

Catastrophic Critical Marginal Negligible

Frequent I I I II Probable I I II III Occasional I II III III Remote II III III IV Improbable III III IV IV Incredible IV IV IV IV

Table 1: Risk classification according to IEC 61508 4.2 DIN EN ISO 13849 The standard DIN EN ISO 13849 is applied to machine related safety. It serves as a guideline to design and integration of safety related parts of control systems. It is applicable for all types (electrical, hydraulic, pneumatic, mechanic etc.) of machines. The estimation of risk is done by comparing the required performance level to the achieved level. 4.3 IEC 62061 The standard IEC 62061 is applied to electric, electronic and programmable electronic machine related safety. It is used along EN ISO 12100 and ISO 14121 (EN 1050). The requirements of the IEC 62061 are targeted at electronical control systems. The standard recommends methods to determine the safety integrity level (SIL). A safety-related control for machines may be realised according to IEC 62061 as well as ISO 13849.

5 SILCAS The software SILCaS can be used for the development process during the specification stage (see V-Model). It serves as a supporting tool for the analysis of dangers and risks without requiring the user to have higher mathematical knowledge. SILCaS supplies calculations for the standard mentioned above. This enables the user to design, configure and calculate complex system architectures in a fast and convenient way with a Graphical User Interface (see Fig.5). SILCaS uses three different types of components: • Certified components • Checked components • Local components One type of components in SILCaS is certified components. It is impossible to alter the properties of a component. Component values are kept to date as long as a project file is in use. After opening and

before printing the database is checked for updates on these components, so a print-out will always be up to date at the moment of printing. Every relevant user action is recorded to a protocol. This includes a short description of the action, the user and the time. The protocol records can only be removed using the “Undo”-function or by beginning a new project from scratch. Otherwise a user action is in the protocol without the possibility to remove it. To record the user who is currently working on the project SILCaS uses a user management system. Reading access to the database is only granted to registered customers. Checked components are also supplied by this database, but their certification may be pending.

Fig.5: SILCaS-GUI The results of the calculation may be printed to a PDF document. This document is intended to speed up certification processes. The document contains all calculation results, the protocol of user actions, a screenshot and information about the licence holder. This collection of information helps to make this document immune to fraud. Even if someone would submit a similar looking or a genuine but tampered document, the fraud would be easily recognisable. The protocol of user actions enables the authorising body to repeat the calculation within minutes if it is also using SILCaS. There are many legitimate situations where the user may leave the path of predefined and certified components, such as the local components mentioned below. This adds additional flexibility to SILCaS. The price for this is obviously that the results of these calculations are banned from an easy certification process. Any of these “uncertified” actions leaves a visible trace in the document which alerts the authorising body to further investigate in these alterations. Local components do not require access to the database. They are blank, so the user needs to supply the data. The correctness of the user inputs must therefore be checked otherwise.


ISBN: 978-960-474-349-0 126

SILCaS supports several different languages which will not only be a benefit for the user but also for local authorities.

6 Example The user chooses a category and drags a new component from the tree on the left (see Fig.6). The component’s values are edited next.

Fig.6: First component created The user renames the component, changes its architecture to 2oo3 and changes some other values. The sensor is now configured (see Fig.7).

Fig.7: Component’s properties For all components in this example the MTTR and MRT are set to 8 hours, and the Proof Test Interval is 10 years. The failure rate of the sensor is set to 5103 −⋅ . The diagnose coverage is .9 and the S-factor is .35. The beta-factors are .04 and .02. This results in a

avgPFD of 3109706.9 −⋅

To get a SIF, the user needs a compact controller and an actuator. The user drags a compact controller and edits its values. The controller has a 1oo1 architecture and a failure rate of 8105 −⋅ , dc = .8 and S = .5. The

avgPFD is 410192.2 −⋅

The user decides to use two different actuators: The first actuator has a 1oo1 architecture and a failure rate of 6107 −⋅ , dc = .6 and S = .5 The second actuator has a 1oo1 architecture and a failure rate of 6106 −⋅ , dc = .7 and S = .5 (see Fig.8)

Fig.8: Editing the actuator’s properties The actuators now need to be connected to form a 1oo2 system. The program will ask for values for the “Common Cause Failure”, which are set to 0.02 and 0.01 (see Fig.9).

Fig.9: Connecting the actuators The actuators are now connected. The combined

avgPFD is 3100223.7 −⋅ . The next and last step is to

connect all components from the sensor to the actuator to a sequence. The user can now see the values for the SIF (see Fig.10).

Fig.10: Complete SIF created The

avgPFD for the SIF is 3107212.1 −⋅ which only

allows a Safety Integrity Level of 1. Now it might happen that the required safety level is not met. The table shows that, in this case, the

avgPFD is the

limiting factor. The user may now experiment with new data or new components. He notices that the sensor is contributing most to the failures. He will now replace the 2oo3 sensor with a 1oo3 sensor. This is done by copying the SIF and the dissolving it. The user may now replace the sensor or edit its values. In this case the user changes only the


ISBN: 978-960-474-349-0 127

architecture from 2oo3 to 1oo3. After editing he connects the components the same way as before (see Fig.11).

Fig.11: Second SIF created With the new architecture the values meet a higher safety standard (see Fig.12).

Fig.12: Table of values The table shows the following properties of the components: • Architecture • Proof test interval • Average probability of failure on demand • Probability of failure per hour • The share of this component’s PFD of the SIF’s PFD • Mean time to failure • Safe failure fraction • The best Safety Integrity Level (SIL) achievable by PFD • The best SIL achievable by PFH • The best SIL achievable by Hardware failure tolerance • The achieved SIL, which is the minimum of the three values above All changes are being logged in PDF-file. The final results are presented in a table in the PDF (see Fig.13).

Fig.13: View of the printed document

7 Conclusion The goal of simplifying the designing process for the user can be considered met. Nevertheless some mechanisms preventing the user to create an incorrect design might confuse him. A number of warning dialogues try to avoid confusion and tell the user what he might have done wrong. As the software is not yet used widely, it is not clear if the goal of standardising the calculation of safety parameters has been met. The future will show if the application of standards is now less prone to error or fraud. The software will be extended to include further standards and other calculation tools like Markov-Models or Fault-Tree-Analysis. References: [1] S.Montenegro, Sichere und fehlertolerante Steuerungen, Entwicklung sicherheitsreleventer Systeme, Hanser, 1999 [2] J.Börcsök, Elektronische Sicherheitssysteme, Hardwarekonzepte Modelle und Berechnung”, Aufl. 2, Heidelberg : Hüthig GmbH &Co KG, 2007. [3] B. Bertsche, G. Lechner, Zuverlässigkeit im Fahrzeug- und Maschinenbau, Ermittlung von Bauteil- und System-Zuverlässigkeiten, Aufl. 3, Heidelberg: Springer, 1963 [4] IEC/EN 61508, Funktionale Sicherheit sicherheitsbezogener elektrischer/elektronischer/programmierbarer elektronischer Systeme, deutsche Fassung, VDE Verlag GmbH, 2001 [5] A. Meyna, B.Pauly, Zuverlässig- keitstechnik Quantitative Bewertungsverfahren, Aufl. 2, Hanser, 2010 [6] D.J. Smith, Reliability Maintainability and Risk, Practical Methods for Engineers including Reliability Centred Maintainance and Safety-Related Systems, 7th ed, Hungary: Butterworth-Heinemann, 2007 [7] P. Wratil, M. Kievet, Sicherheitstechnik für Komponenten und Systeme, Heidelberg: Hüthig, 2007 [8] M. Rausand, A. Hoyland, System Reliability Theory, Whiley-Intersience, 2004 [9] J.Börcsök, Funktionale Sicherheit Grundzüge sicherheitstechnischer Systeme, Aufl. 2, Heidelberg : Hüthig GmbH &Co KG, 2008.


ISBN: 978-960-474-349-0 128

software for calculation of complex safety …...software for calculation of complex safety...

Documents