dots-lcci deliverable d3.1 v2 0 final

Program of National Research Interest (PRIN 2008) - Project “DOTS-LCCI”

Deliverable D3.1 “Modeling and Evaluation: State-of-the-art” 1

DOTS-LCCI Dependable Off-The-Shelf based Middleware Systems for

Large-scale Complex Critical Infrastructures

Program of Relevant National Interest (PRIN) Research Project Nr. 2008LWRBHF

Project funded by the Italian Ministry for University and Research (MIUR)

Deliverable no.: 3.1

Deliverable Title: Modeling and Evaluation: State-of-the-art

Organisation name of lead Contractor for this Deliverable:

University of Firenze (UniFI)

Author(s): A. Bondavalli, P. Lollini (editors), A. Bovenzi, M. Colajanni, L. Coppolino, C. Esposito, M. Ficco, C. di Martino, L. Montecchi, R. Natella, A. Pecchia

Participant(s): UniFI, UniMORE, UniPARTHENOPE, UniNA

Work package contributing to the deliverable:

3

Task contributing to the deliverable: 3.1

Version: 2.0

Total Number of Pages 100



Project DOTS-LCCI Dependable Off-The-Shelf based Middleware Systems for

Large-scale Complex Critical Infrastructures

DELIVERABLE D3.1 MODELING AND EVALUATION: STATE-OF-THE-ART

Table of Versions

Version Date Contributors Version Description

Reviewers Date of Approval

0.1 01 December 2010

A. Bondavalli, P. Lollini

Initial document structure

0.2 16 December 2010

M. Colajanni, L. Coppolino, P.

Lollini

ToC reviewed and finalized

All

0.3 21 January 2011

UniFI, UniMORE, UniPARTHENOPE

Version including most of the expected partners’ contributions. Pending items: - Executive summary to be completed (UniFI); - Intro to Chapter 3 missing (UniMORE); - Section 3.1.4 missing (UniMORE); - Intro to Section 3.2 missing (UniMORE); - Conclusions missing (UniFI).

0.5 01 February 2011

UniMORE, UniFI Version including the expected contributions by UniMORE, with updated references. Pending items: - Executive summary & Conclusions to be done/completed (UniFI); - Contributions offered by UniNA still to be provided and integrated.

0.6 04 February 2011

UniNA Version including UniNA contributions: SAN (Catello di Martino), Fault Injection (Roberto

Christian Esposito



Natella), FFDA (Antonio Pecchia) and On line Monitoring (Antonio Bovenzi)

0.7 16 March 2011 All Version addressing the major comments discussed during the f-t-f meeting in S. Vito Di Cadore. Main modifications: - Moved Section 3.1.1 at the beginning of the deliverable, and slightly reviewed; - Reviewed the content of Section 3.1.2; - Removed Section 3.2.1; - Reviewed the content of Section 3.2.2; - Inserted a new section within Chapter 4 dealing with works that combine different modeling formalisms.

Paolo Lollini

1.0 24 March 2011 Paolo Lollini First complete review. All 1.1 28 March 2011 C. Esposito, S.

Russo, C. Di Martino

- Section 2.1.2 on “State-based models” extended with a more detailed description of SAN formalism. - Section 2.4 on “Deriving dependability models from engineering models” extended with the description of a workflow for automating the creation of dependability models for assessing the dependability of WSNs. -Inserted a new section within Chapter 4 (section 4.5) discussing a multi-formalism framework for the automated generation of dependability models.

Paolo Lollini

2.0 31 March 2011 Paolo Lollini Final version. All



Table of Contents ACRONYMS ................................................................................................................................. 6

EXECUTIVE SUMMARY .............................................................................................................. 8

1. INTRODUCTION ............................................................................................................... 10 1.1 Dependability and Security Metrics ................................................................................ 10 1.1.1 Quality of a Metric .......................................................................................................... 12 1.1.2 Metrics Domains ............................................................................................................. 13 1.1.3 Examples ......................................................................................................................... 14

2. MODEL-BASED APPROACHES........................................................................................ 18 2.1 Formalisms for modeling dependability ......................................................................... 18 2.1.1 Combinatorial models ..................................................................................................... 18 2.1.2 State-based models .......................................................................................................... 19 2.2 Model construction and solution approaches .................................................................. 21 2.2.1 Compositional approaches .............................................................................................. 22 2.2.2 Decomposition and aggregation approaches ................................................................... 23 2.3 Dependability modeling and solution tools ..................................................................... 25 2.4 Deriving dependability models from engineering models .............................................. 26

3. EXPERIMENTAL MEASUREMENTS TECHNIQUES ........................................................ 29 3.1 Dependability and Security Benchmarking of SCADA based LCCIs ............................ 29 3.2 Data filtering and analysis ............................................................................................... 32 3.2.1 Interpolation techniques .................................................................................................. 32 3.2.2 Smoothing techniques ..................................................................................................... 35 3.3 Anomaly detection .......................................................................................................... 38 3.3.1 Classification-based Anomaly Detection Techniques ..................................................... 41 3.3.2 Nearest Neighbor-based Anomaly Detection .................................................................. 42 3.3.3 Clustering-based Anomaly Detection ............................................................................. 45 3.3.4 Statistical Anomaly Detection ......................................................................................... 46 3.3.5 Information Theoretic Anomaly Detection ..................................................................... 48 3.4 Evaluation methods for Intrusion detection in LCCIs ..................................................... 48 3.4.1 Evaluation Methods ........................................................................................................ 49 3.4.2 Evaluation Metrics .......................................................................................................... 49 3.4.3 Evaluation Dataset .......................................................................................................... 50 3.4.4 Pentesting tools ............................................................................................................... 50 3.4.5 Existing Solutions for LCCI IDSs ................................................................................... 51 3.5 Field Failure Data Analysis ............................................................................................. 54 3.6 On line Monitoring .......................................................................................................... 57 3.7 Fault Injection ................................................................................................................. 59

4. COMBINING DIFFERENT APPROACHES ........................................................................ 65 4.1 Works combining different modeling formalisms .......................................................... 65 4.2 Relationships between modeling and experimentation ................................................... 67



4.3 Works combining modeling and simulation ................................................................... 68 4.4 A holistic evaluation framework ..................................................................................... 69 4.5 A multi-formalism framework for the automated generation of dependability models .. 71

5. CONCLUSIONS ................................................................................................................. 74

REFERENCES............................................................................................................................ 75 List of figures Figure 1. Dependability and security tree ............................................................................................. 11 Figure 2. Typical Security/Risk Assessment Cycle .............................................................................. 12 Figure 3. Reliability and Unreliability .................................................................................................. 15 Figure 4. Data filtering techniques classification .................................................................................. 33 Figure 5. Piecewise constant interpolation ........................................................................................... 33 Figure 6. Simple regression interpolation ............................................................................................. 34 Figure 7. Polynomial interpolation ....................................................................................................... 35 Figure 8. Spline Interpolation ............................................................................................................... 35 Figure 9. EWMA smoothing ................................................................................................................. 37 Figure 10. ARIMA smoothing .............................................................................................................. 38 Figure 11. DWT smoothing .................................................................................................................. 38 Figure 12. Dependability evaluation based on modeling and experimentation .................................... 67 Figure 14 – Steps performed by the multi-formalism framework for the assessment of WSNs ......... 71



Acronyms AADL Architecture Analysis and Design Language AR Auto-Regressive ARIMA Auto-Regressive Integrated Moving Average ARMA Auto-Regressive Moving Average BPEL Business Process Execution Language CDF Cumulative Distribution Function CTMC Continuous-Time Markov Chain DARPA Defence Advanced Research Projects Agency DFT Dynamic Fault Tree / Discrete Fourier Transform DOS Denial of Service DRBD Dynamic Reliability Block Diagram DSPN Deterministic and Stochastic Petri Net DTMC Discrete-Time Markov Chain DWT Discrete Wavelet Transform EI Electrical Infrastructure EM Expectation Maximization EWMA Exponential Weighted Moving Average FFDA Field Failure Data Analysis FPN Fluid Petri Net FT Fault Tree GPRS General Packet Radio Service GSPN Generalized Stochastic Petri Net GTPN Generalized Timed Petri Net HIDS Host Intrusion Detection System HMI Human-Machine Interface IDS Intrusion Detection System II Information Infrastructure LAN Local Area Network LCCI Large-scale Complex Critical Infrastructure LOF Local Outlier Factor MA Moving Average MDE Model Driven Engineering MLE Maximum Likelihood Estimate MNTE Mean Number of Transactions Executed MRSPN Markov Regenerative Stochastic Petri Net MTBC Mean Time Between Compromise MTBF Mean Time Between Failures



MTTC Mean Time To Compromise MTTCF Mean Time To Catastrophic Failure MTTD Mean Time To Diagnose / Mean Time To Detection MTTF Mean Time To Fail MTTR Mean Time To Repair NIDS Network Intrusion Detection System OTS Off-The-Shelf PMS Phased Mission System PN Petri Net QoS Quality of Service RBD Reliability block diagram RBF Radial Basis Function RG Reliability Graph RTU Remote Terminal Unit SAN Stochastic Activity Network SCADA Supervisory Control And Data Acquisition SIEM Security Information Event Management SMA Simple Moving Average SMSPN Semi-Markovian Stochastic Petri Net SOM Self- Organizing Map SPN Stochastic Petri Net SRN Stochastic Reward Net SVM Support Vector Machine SWN Stochastic Well-formed Net TPN Timed Petri Net TTF Time to Fail TTR Time to Repair UML Unified Modeling Language UMTS Universal Mobile Telecommunications System VAWPN Variable Arc Weighting Petri Net WSN Wireless Sensor Network



Executive Summary The DOTS-LCCI research project aims to define novel middleware technologies, models, and methods to assure and assess the resiliency level of current and future Off-The-Shelf (OTS)-based Large-scale Complex Critical Infrastructures (LCCIs), to diagnose faults in real time, and to tolerate them by means of dynamic reconfiguration. Assuring the resiliency level of LCCI is crucial to reduce, with known probabilities, the occurrence of catastrophic failures, and consequently, to adopt proper diagnosis and reconfiguration strategies. Project efforts will progress according to three main directions:

i) Distributed architectures for LCCIs, their components (OTS and legacy), and their resiliency requirements will be studied, in order to define algorithms and middleware architectures for improving dependability attributes of future LCCIs;

ii) Strategies for on-line diagnosis and reconfiguration will be studied and defined, specifically tailored for OTS-based LCCIs, according to the resiliency assurance requirements;

iii)Tools and techniques for modeling and evaluating LCCIs will be devised. This document is the Deliverable D3.1 of DOTS-LCCI Project. It concerns the state-of-the-art of modeling and evaluation techniques for LCCIs, taking into account OTS-based systems, legacy systems and SCADA systems. Deliverable D3.1 constitutes the starting point for the definition and design of innovative tools and techniques for evaluating LCCI, which is the core topic that will be addressed within D3.2. The document initially provides an overview of the deliverable’s content, and discusses the dependability and security metrics that are targeted by the modeling and evaluation techniques surveyed in the rest of the document. The main contribution of the deliverable is the analysis of the state-of-the-art of modeling and evaluation techniques based on more than 300 references. Specifically, we focus on:

• Model-based evaluation techniques, providing an overview i) on the different formalisms for modeling dependability and the supporting tools, ii) on the available approaches that facilitate the model construction and solution process, and iii) on the approaches based on automatic derivation of dependability models from engineering models.

• Experimental-based evaluation techniques, covering in particular: i) dependability and security benchmarking, ii) data filtering and analysis techniques, iii) intrusion detection techniques, iv) Field Failure Data Analysis techniques, v) on line monitoring techniques, and vi) fault injection.

• Works combining different evaluation techniques, focusing on the interactions between i) different modeling formalisms, ii) modeling and experimental techniques, and iii) modeling and simulation techniques.

Project partners – the Research Units at Universities of Naples “Federico II”, Florence, Modena and Naples “Parthenope” – have brought into this document their expertise on commonly used evaluation techniques applied for the analysis of real LCCI, based on



their long running collaborations with relevant industrial companies. The goal of this deliverable is then to serve as a reference basis for the future activities within the project, and in particular for the definition and design of innovative tools and techniques for evaluating LCCI, which is the core activity of Task 3.2.



1. Introduction This document presents the state-of-the-art of modeling and evaluation techniques, which constitutes the basis for the definition and design of innovative tools and techniques for evaluating Large-scale Complex Critical Infrastructures (LCCIs). The quantitative evaluation of dependability-related attributes aims at probabilistically estimating the adequacy of a system with respect to the requirements given its specification. The quantitative evaluation within the DOTS-LCCI project will be based on two main evaluation approaches, model-based and experimental measurement-based, which will be discussed in Chapters 2 and 3, respectively. Simulation techniques based on specific LCCI simulators have not been addressed within this document as outside the main stream of the project. The two evaluation approaches show different characteristics, which determine the suitability of the method for the analysis of a specific system aspect. The model-based approach can be used to analyze the system behavior at various levels of abstraction and for system assessment in all phases of the system life cycle. Moreover, it can be used to perform a sensitivity analysis that allows identifying system bottlenecks, highlighting problems in the design, identifying the critical parameters to which the system is highly sensitive, guiding the experimental activities and to provide answers to “what-if” like questions. Experimental measurement is an attractive option for quantitative assessment of an existing system or prototype. This method allows monitoring the real execution of a system to obtain highly accurate measurements of the metrics of interest. However, it may turn out to be quite expensive, e.g., when the interest is in very rare events, and the obtained results are often difficult to generalize. In this case, appropriate techniques based on active measurements and controlled experiments (e.g., fault injection) can be adopted. The characteristics of current LCCIs, like their largeness, complexity and heterogeneity, often including a dynamic mixture of components built by different parties and for different purposes, call for a composite evaluation framework where the synergies and complementarities among different evaluation methods can be fruitfully exploited. In this perspective, model-based approaches can be profitably used as support for experimentation, and vice-versa. An overview of the works combining different modeling formalisms and evaluation approaches is presented in Chapter 4. For a better understanding of the following chapters and as a gluing element of the whole deliverable, in the next section we will summarize the main dependability and security metrics that are addressed by the different evaluation techniques, also discussing the qualities that a good metric should have and the metrics domains.

1.1 Dependability and Security Metrics The first step for any evaluation process is the identification of the metrics to be quantitatively assessed. This section discusses the dependability and security metrics that are targeted by the quantitative evaluation methods proposed in the rest of this document, so it is the common starting point of the deliverable. First of all, we need to define what is meant by dependability and security.



Dependability is the ability to avoid service failures that are more frequent and more severe than is acceptable. Dependability is an integrating concept that encompasses the following attributes:

• availability: readiness for correct service. • reliability: continuity of correct service. • safety: absence of catastrophic consequences on the user(s) and the

environment. • integrity: absence of improper system alterations. • maintainability: ability to undergo modifications and repairs.

When addressing security, an additional attribute has great prominence, confidentiality, i.e., the absence of unauthorized disclosure of information. Security is a composite of the attributes of confidentiality, integrity, and availability, requiring the concurrent existence of 1) availability for authorized actions only, 2) confidentiality, and 3) integrity with “improper” meaning “unauthorized.”

Figure 1. Dependability and security tree

A systematic exposition of the concepts of dependability and security [Avizienis 2000, Laprie 2004, Nicol 2004] assumes it as consisting of three parts: the threats to, the attributes of, and the means by which dependability and security is attained, as shown in Figure 1. The service delivered by a system is its behavior as it is perceived by its users (either human or not) that interacts with the former at a service interface. The function of a system is what the system is intended for and what is called to deliver with continuity. Correct service is delivered when the service implements the system function accordingly with system specifications. A system failure is an event that occurs when the delivered service deviates from correct service. A failure is a transition from correct service to incorrect service, i.e., to not implementing the system function. A transition from incorrect service to correct service is service restoration. The time interval during which incorrect service is delivered is a service outage. An error is that part of the system state that may cause a subsequent failure: a failure occurs when an error reaches the service interface and alters the service. A fault is the adjudged or



hypothesized cause of an error. A fault is active when it produces an error, otherwise it is dormant.

1.1.1 Quality of a Metric The aim of metrics is to establish comparable knowledge about a system. In our case, we wish to gather all the necessary information describing the general state of dependability and security of our system, i.e., both its vulnerability to threats from the outside and the reliability and trustworthiness of its internal components. The establishment of knowledge is the prerequisite for any directed action to remediate the state of the system. However, not any metric is per se helpful. We will now discuss what constitutes the quality of a metric. When we look at typical risk management and security assessment scenarios, we often encounter a procedure which consists of roughly the same four steps: Assessment, Reporting, Analysis or Prioritization, and finally Reaction or Mitigation (see Figure 2 and [Jaquith 2007]).

Figure 2. Typical Security/Risk Assessment Cycle In the example of a human security expert team, analyzing a system failure, we would expect to see exactly this procedure. In a first step, various critical aspects of the system, which might have caused the problem, are measured and surveyed by the team. The data is then aggregated into a first report, highlighting the risks and threats. This report is analyzed to identify the source of the problem, and finally the necessary steps are taken to eliminate it. Another example might be a common Intrusion Detection System (IDS) software: The system is constantly scanned for known vulnerabilities. From this, a report of intrusion detection events is generated. The events are classified and prioritized, then, the appropriate reaction is initiated. Both of these examples present a straightforward way to deal with issues as they come up. However, in the descriptions above, the monitoring system returns to a state of ignorance after each cycle. The failure has been repaired – but what we do not know is whether the system is now in a better, or more dependable/secure, state than before. To this regard, we need metrics that, at the end of each cycle, can establish an absolute difference in quality or value to any previous cycle.



[Jaquith 2007] names the following quality criterions for good security metrics. As we see from the examples above, the most important property of a good security metric is that it is continually monitored. In the optimal case it is being automatically or mechanically generated, and proactively forwarded to the monitoring system. If a metric is monitored over a certain period of time, it must be consistently monitored. The process and parameters to retrieve the information must not change over time, and must use identical methods across all systems thus monitored. This allows the metric to be used for benchmarking, i.e. to compare both between moments in time and between similar systems. Again, automated methods of data collection are preferable over manual ones. To allow continuously and consistently monitor, it is necessary that the data collection be not only technically feasible but also sufficiently cheap to calculate. This might make the repeated collection affordable at all, but in all cases it will reduce the impact that the monitoring process has on the system, as well as its inherent cost. Automated monitoring also reduces the risk of subjectivity. Good metrics therefore are expressed in cardinal numbers or as a percentage, as opposed to qualitative or semantically undefined identifiers (rankings, “high/low”, traffic lights). Finally, a metric must have a clear semantic meaning in the context in which it is being used. This requires that it is expressed using one or more units of measurement. Usually, a relation between units is preferable to a single unit: For example the metric “number of system failures” is less meaningful than a percentage or relation “number of system failures per machine”. Contextually specific means, at the same time, that the metric must allow the user to understand exactly which part of the system is described or affected. The counter-example here would be an “average component time to failure”, which may only cause the operator to all components at the same time, or do nothing. Finally, in order to place all metrics into a higher common context, they should all express or refer to a business value or cost. This may be a direct financial cost, or any unit, which can be directly converted into costs when the necessary parameters are know. Time is one unit that is related to cost immediately by the link of given operating or personnel costs. System downtime for example can also be expressed as losses. If the downtime cost of a system is continually monitored, a metric such as “total downtime of component X per day” is directly helpful to both technical and management staff.

1.1.2 Metrics Domains As we have seen, the term dependability and security refer to the protection both of the whole perimeter of a system and of its internal workings. In this section we will establish which domains of the system have to be monitored to gain an overview of existing risks (see Table 1 and [Jaquith 2007]). As all metrics have to be placed in a financial context, the necessary conversion parameters themselves protrude from the Financial Domain. Typical metrics here are operating and downtime costs, and transaction values. Using these metrics, we can value the reliability and Technical Domain security of the actual system. This represents the correctness, timeliness and performance of every component of the system. Typical metrics include the mean time to failure, false positive



and false negative rates in anomaly detection, and the amount of transactions processed by a component. The next aspect to assess is the User Domain. The human factor is probably the most difficult to control in a system. How many people have access to each component? How reliable are they? How well have they been schooled? What damage can be done? Finally, the security of a critical infrastructure must adapt to future changes. This constitutes the Development Domain: How well does a system, its users and its components adapt to future requirements. How many communications use standard protocols? How often is software updated? How many users are trained to new requirements?

Technical Metrics Financial Metrics

User Metrics Development Metrics

Table 1: Metrics Domains

1.1.3 Examples In this section we introduce some of the main dependability and security metrics that have been targeted by the quantitative evaluation approaches presented in the past literature. Availability - The availability of a system as a function of time, A(t), is the probability that the system is operational at the instant of time, t. If the limit of this function exists as t goes to infinity, it expresses the expected fraction of time that the system is available to perform useful computations. Activities such as preventive maintenance and repair reduce the time that the system is available to the user. Availability is typically used as a figure of merit in systems in which service can be delayed or denied for short periods without serious consequences [Siewiorek 1992, Shooman 2002, Brown 2000]. Reliability - The reliability of a system as a function of time, R(t), is the conditional probability that the system has survived the interval [0, t], given that the system was operational at time t = 0. Reliability is used to describe systems in which repair cannot take place, systems in which the computer is serving a critical function and cannot be lost even for the duration of a repair, or systems where the repair is prohibitively expensive [Siewiorek 1992, Shooman 2002, Brown 2000]. The reliability function R(t) holds the following properties:

• It is a monotonically decreasing function.

• For t=0, R(0)=1 (the component is supposed to work correctly in the beginning).

• For per ∞→t , R(t) 0→ (the component will eventually fail).



Figure 3. Reliability and Unreliability

Unreliability - unreliability or also the probability of failure, F(t), is a property related to Reliability by this law:

R(t)+F(t)=1 (in every instant t the system will either fail or continue to work). The F(t) is the Cumulative Distribution Function (CDF) of the Time to Fail (TTF) random variable and it represent the probability that the system fails in the time interval (0,t). A possible trend for both R(t) and F(t) is shown in Figure 3. Given the properties of a CDF follows that the probability for a failure to happen in an interval of time (t1, t2) is: Pr(t1<tfailure ≤ t2)=R(t1)-R(t2

The mean value of unreliability is the MTTF (Mean Time To Fail), while the mean time between two consecutive failures is the MTBF (Mean Time Between Failures) that can obviously be considered only for repairable components.

).

Time to Repair – the time to repair (TTR) is the amount of time needed to restore system’s correct service delivery after a failure as occurred. Usually the parameter of interest is TTR’s mean or expected value, the Mean Time To Repair (MTTR). Maintainability – the Maintainability CDF, M(t), is defined as the probability of performing a successful repair action within a given time. In other words, maintainability measures the ease and speed with which a system can be restored to operational status after a failure occurs. Safety – the Safety S(t) is a further parameter related to catastrophic failure that lead to total outage of service with potential risks for human beings and environment. It represents the probability of a catastrophic failure not happening in interval (0,t). Its mean value is the It is important to underline the difference between Availability and Reliability, while the former is a probability of a punctual state (the availability of the system at the time t), the later is a probability over e period of time (the system is working in the (0,t) interval).

Mean Time To Catastrophic Failure (MTTCF).

1. A commonly used parameter to measure Availability is the Steady State Availability, the limit of the instantaneous availability function as time approaches infinity or;



2. )(lim tAAt ∞→

= This parameter is can be calculated as MTTRMTBF

MTBFA+

= under

the hypothesis that repair completely regenerates the system functionality.

Actually instead of MTTR it would be more accurate to specify a Mean Down Time (MDT) as the mean period of time the systems is out of service, it would include the following stages:

The time it takes to successfully diagnose the cause of the failure (Mean Time To Diagnose, MTTD);

The time for the effective repair (Mean Time to Repair MTTR);

The time it takes to replace the failed components with functioning ones Sometimes, for the sake of simplicity, the later is included in the MTTR. We should note that Reliability is not the feature directly experienced by final users who actually experience Availability. In a complex system an high level of availability can be achieved even when the reliability of components is low, given that the MTTR is kept low. Concerning security, the following definitions were proposed by [Neves 2006]:

• Security hazard – An unintentional or intentional attack to system security, which makes the system exposed to potential security breaches;

• Security compromise – A security breach which manifests in the system. It is the consequence of a security hazard.

The paper also defines a portfolio of security measures as an extension to security of measures already defined in the field of dependability. Some basic measures are defined in the following:

• Mean Number of Transactions Executed (MNTE) - The mean number of transactions executed in a secure way, before the occurrence of a security compromise;

• Mean Time To Compromise (MTTC) - The mean time elapsed before the occurrence of a security compromise;

• Mean Time Between Compromise (MTBC) - The mean time between the occurrence of security compromises;

• Mean Time To Detection (MTTD) - The mean time elapsed before the detection of a hazard/compromise;

• Mean Time To Removal (MTTR) - The mean time elapsed before the removal of a hazard/compromise.

A fundamental measure is latency. It is possible to distinguish between three different kinds of latency, namely:



• Hazard activation latency - The amount of time an undetected hazard stays latent, before it is activated (i.e., originates a security compromise);

• Hazard/compromise detection latency - The amount of time an hazard/compromise, which is present in the system, stays undetected;

• Hazard/compromise removal latency - The amount of time an undetected hazard/compromise persists in the system, before it is eliminated.



2. MODEL-BASED APPROACHES A model is an abstraction of a system “that highlights the important features of the system organization and provides ways of quantifying its properties neglecting all those details that are relevant for the actual implementation, but that are marginal for the objective of the study” (see [Balbo 2001]). Model-based evaluation is usually cheaper than experimental evaluation and it can be used in all the phases of the system lifecycle. During the design phase, models allow to compare different alternative architectural solutions, to select the most suitable one and to highlight problems within the design (“early” validation). Once design decisions are made, models allow predicting the overall behavior of the system. Finally, for an already existing system, which is quite often the case in the critical infrastructure field, models allow an “a posteriori” dependability analysis to understand and learn about specific aspects, to detect possible design weak points or bottlenecks, to perform a late validation of the dependability requirements and to suggest sound solutions for future releases or modifications of the systems. There are very different types of models, and the selection of the proper model to be used depends on different factors, like the complexity of the system, the specific aspects to be studied, the attributes to be evaluated, the required accuracy, the metrics of interest, as well as the resources available for the study. Depending on the underlying stochastic process they can be solved analytically or, when the analytical approaches are not applicable, by simulation. In this chapter we present the state-of-the-art on model-based evaluation approaches for dependability assessment, discussing in particular the following topics: available formalisms for modeling dependability (Section 2.1) and supporting tools (Section 2.2), available approaches that facilitate the model construction and solution process (Section 2.3), and finally approaches based on automatic derivation of dependability models from engineering models (Section 2.4).

2.1 Formalisms for modeling dependability A system designer has in his or her possession a wide range of analytical modeling techniques to choose from. Each of these techniques has its own strengths and weaknesses in terms of accessibility, ease of construction, efficiency and accuracy of solution algorithms, and availability of supporting software tools. The most appropriate type of model depends upon the complexity of the system, the questions to be studied, the accuracy required, and the resources available to the study. Analytical models can be broadly classified into non-state space (combinatorial) models and state space models.

2.1.1 Combinatorial models Combinatorial models are one of the simplest approaches to dependability modeling of critical systems. In such kind of models the measures of interest on the overall system are evaluated based on the measures of the individual components, using combinatorial and/or probability formulas. Reliability block diagrams (RBDs), fault trees (FTs) and reliability graphs (RGs) are combinatorial formalisms commonly used to study the dependability of systems. They



are concise, easy to understand, and have efficient solution methods. However, realistic features such as interrelated behavior of components, imperfect coverage, nonzero reconfiguration delays, and combination with performance cannot be captured by these models. These arguments led to the development of new formalisms, such as dynamic fault trees (DFTs) and dynamic reliability block diagrams (DRBDs), to model reliability interactions among components or subsystems. A brief overview of traditional non-state space models can be found in [Nicol 2004], while some of their “dynamic” extensions are outlined in [Distefano 2008].

2.1.2 State-based models State-based models are able to capture various functional and stochastic dependencies among components and therefore produce more detailed and rich system models. Discrete-Time Markov Chains (DTMCs), Continuous-Time Markov Chains (CTMCs), and Markovian models in general are commonly used for dependability modeling of computing systems (e.g., see [Trivedi 2001] for full details). They are able to capture a wide range of interactions between system components, still allowing the analytical evaluation of various measures related to dependability and performance (performability), when reward structures are associated with them. Unfortunately not all the existing systems and their features can be captured properly by Markov processes; in some cases more general processes (e.g., semi-Markov, Markov Regenerative or even non-Markovian processes) must be used. When dealing with such processes, the analytical solution techniques for their evaluation may become complex and costly, if they exist. If analytic solution methods do not exist, discrete-event simulation may be used to evaluate the models, therefore obtaining only estimates of the measures of interest. Simulation may however be time consuming because of the rare event problem: events of interest occur so rarely with respect to the system’s lifetime that very lengthy simulations runs are necessary to obtain reliable results. Alternatively, one can approximate an underlying non-Markovian process with a Markov process, and thus represent a non-exponential transition with an appropriate network of exponential ones (phased-type approach). The price to pay following this approach is a significant increase in the number of states of the resulting Markov model. The work in [Trivedi 1996] reviews the existing state space models (as well as combinatorial ones) and it discusses the benefits and the limitations of each. To facilitate the generation of state-space models based on Markov chains and their extensions, higher-level modeling formalisms like Stochastic Petri Nets (SPNs) are commonly used. These formalisms allow a more compact model representation because they support concurrency and modularization. Several classes of Stochastic Petri Nets exist, having different characteristics and different modeling power. In [Ciardo 1994], the authors explore and discuss a hierarchy of SPN classes where modeling power is reduced in exchange for an increasingly efficient solution. Stochastic Petri Nets and their extensions have been widely used in dependability modeling of LCCIs as well. In [Krings 2003] Generalized Stochastic Petri Nets (GSPNs) have been used to model cascading failures in electric power systems, taking into account of the effects of power line overloading. The authors of [Ten 2008] have used the GSPN formalism to assess the vulnerability to cyber attacks of a SCADA system controlling an electric power system. The scenario that has been considered in that



work includes different substation protected by password policies and firewalls, having access to the main control station. Other widely used modeling formalisms are Deterministic and Stochastic Petri Nets (DSPNs), Semi-Markovian Stochastic Petri Nets (SMSPNs), Timed Petri Nets (TPNs), Generalized Timed Petri Nets (GTPNs), Stochastic Reward Nets [Bobbio 1998], Stochastic Activity Networks (SANs) [Sanders 2001] and Markov Regenerative Stochastic Petri Nets (MRSPNs) [Choi 1994]. Some classes of Stochastic Petri Nets allow for a more detailed modeling of LCCIs, through the interaction with external tools (e.g., see [Chiaradonna 2007] or [Bondavalli 2009]). In these cases, however, the analytical solution is usually precluded and the model must be solved using discrete-event simulation. SANs and have a custom graphical representation. Circles represent places in which tokens may reside. Places may contain zero, one or several tokens mimicking the state of the modeled system. Extended places may also contain user defined data types such as structs, arrays or matrices. The number of tokens in each place can change when activities complete. Activities are represented by lines, either thin ones (so-called instantaneous activities) or thick ones (so-called timed activities). A timed activity interposes a delay between the instant when the activity becomes enabled through some change in the system state, and when the firing actually occurs. As for the stochastic Petri Nets, an activity is enabled if there is a token in all the places connected to it, and no instantaneous activities are enabled. Since timed activities represent operations in a modeled system, events must be defined to denote the start and finish of these operations. The start of an operation is signaled by an activation of an activity. An activation of an activity will occur if 1) the activity becomes enabled or 2) the activity completes and is still enabled. Some time after an activity is activated it will either complete or be aborted. The activity will complete if it remains enabled throughout its activity time (which will be defined momentarily); otherwise it is aborted. The amount of time to complete a timed activity (firing of an activity) can follow a specific stochastic distribution such as Exponential and Weibull. Both the distribution type and its parameters can depend on the global marking of the network at the activation time of the activity. Activity times are assumed to be mutually independent random variables. Cases can be associated to activities (represented by circles on the right side of an activity). They permit the realization of two types of spatial uncertainty. Uncertainty about which activities are enabled in a certain state is realized by cases associated with intervening instantaneous activities. Uncertainty about the next state assumed upon completion of a timed activity is realized by cases associated with that activity. In other words, cases allow to execute a branch of the model instead of another, with a given probability (specified as case probabilities). These probabilities can depend on the markings of the input and output places of the activity at its completion time. In the SAN formalism, gates are introduced to permit greater flexibility in defining enabling and completion rules. If no gates are linked before and after an activity, the default rule is to decrement the mark of all the places in input to the activity and to increment the mark of all the places after the activity. An activity may have an input gate (red triangles) connected to it, with logic that



describes the system state under which the activity is enabled to complete (input gate predicate). The logic in an input gate is a Boolean function of the system state, i.e., of expressed as a boolean function of the mark of one or several places. A SAN may also use an output gate (black triangle) following an activity. Logic within the output gate describes the modification to the system state that occurs as a result of the firing. The use of gates permits a greater flexibility by embedding custom C++ code into the model, hence allowing to invoke functions defined into external libraries and facilitating the integration of analytical models with external tools/libraries. SANs primitives enable also the adoption of a divide-et-impera approach for the construction of complex models by means of a Join and Replicate operators, which combines several distinct models by means of shared places, i.e., places (states of the model) shared between different models. The Join operator combines 2 or more different SANs together, combining certain places to permit communication. The Replicate operator combines 2 or more identical SANs together, holding certain places common among the replicas. SANs allow to define custom metrics by means of reward variables, i.e., rules and data structures that allow to accumulate a reward when specific conditions are met. Reward variables are a way of measuring performance or dependability-related characteristics about a model. Reward may be ”accumulated” in two different ways. A model may be in a certain state or states for some period of time (a rate reward is accumulated) or an activity may complete (a impulse is accumulated). A reward variable is the sum of the impulse and rate reward structures over a certain time.

2.2 Model construction and solution approaches The main problem in using state-based models to realistically represent the behavior of a LCCI is the explosion in the number of states (often referred to as state space explosion problem). Significant progress has been made in addressing the challenges raised by the large size of models, at the model construction and model solution levels, using a combination of techniques that can be categorized with respect to their purpose (largeness avoidance, largeness tolerance). For a comprehensive survey on this topic see [Nicol 2004], [Lollini 2007] and [Kaâniche 2008]. Largeness avoidance techniques try to reduce the size of the models using many different approaches. Largeness tolerance techniques try to optimize the generation and processing of the models through the use of a) systematic rules to support the elaboration of the models and b) space and time efficient algorithms to optimize the state space generation, storage and exploration. It is important to note that these categories of techniques are complementary and both are needed, at the model construction and model solution levels, when detailed and large dependability models need to be generated and processed to evaluate metrics characterizing the resilience of real life systems. In the following subsections we focus on the model construction phase and we present two complementary techniques that can be applied to cope with model largeness, grouping them in compositional approaches (Section 2.2.1) and decomposition/aggregation approaches (Section 2.2.2). A specific discussion on the usage of graphical formalisms to deal with model complexity can be found in deliverable D1.1 (see [PRIN-DOTS-LCCI-D1.1 2010], Section 2.5.2).



2.2.1 Compositional approaches This class groups those techniques that build the system model as a modular composition of simpler sub-models that are then solved as a whole. Most of the works belonging to this class define the rules to be used to construct and interconnect the sub-models, providing an easy way to describe the behavior of systems having a high degree of dependency between subcomponents. These dependencies can be exploited to manage the model complexity creating, for example, smaller, equivalent representations. Research on process algebra [Milner 1989] has inspired efforts to introduce compositionality into Petri nets. Composition of Petri Net (PN) consists in constructing PN models from a set of building blocks by applying suitable operators of places and/or transition superposition. Composition approaches have been explored for different classes of stochastic Petri nets. For example, [Buchholz 1995] explored composition in the context of Stochastic Petri Nets (SPNs) and [Donatelli 1996] proposed a systematic compositional approach to the construction of parallel hardware-software models using Generalized Stochastic Petri Nets (GSPNs). Composition operators have been defined in [Rojas 1996, Ballarini 2000] for the generation of Stochastic Well-formed Nets (SWNs) from its components. These operators preserve the functional structure of the model and support several types of communication between components. This approach is intended to support the modeling of distributed and parallel systems where both synchronous and asynchronous communications are required; however, it addresses only the class of systems that can be modeled by SWN. Another example of composition operators is used in the context of Stochastic Activity Networks (SANs). In [Meyer 1993], two composition operators are defined (Join and Replicate) to compose system models based on SANs. The Join operator takes as input a set of sub-models and some shared places belonging to different sub-models, and provides as output a new model that comprehends all the joined sub-models elements (places, arcs, activities) but with the shared places merged in a unique one. The Replicate operator combines multiple identical copies of a sub-model, which are called replicas, sharing some selected places. [Obal 1998] introduces a graph composition approach that extends the replicate/join formalism and also combines models by sharing a portion of the state of each sub-model, reducing the total state-space size. Contrarily to the join/replicate formalism that requires the use of a special operation, the graph composition detects all the symmetries exposed at the composition level and uses them to reduce the underlying state space. These composition techniques are very helpful to cope with the models complexity, in particular when the models exhibit symmetries. However, they are not sufficient in particular when the modeled systems exhibit various dependencies that need to be explicitly described in the dependability models. These dependencies may result from functional or structural interactions between the components or from interactions due to global system strategies (e.g., fault tolerance, maintenance). Various modeling approaches have been proposed to facilitate the construction of large dependability models taking into account such dependencies. A first example is the block modeling approach defined in [Kanoun 2000], which provides a generic framework for the dependability modeling of hardware and software



fault-tolerant systems based on GSPNs. The proposed approach is modular: generic GSPN submodels called block nets are defined to describe the behavior of the system components and of the interactions between them. The system model is obtained by composition of these GSPNs. In [Fota 1999] an incremental and iterative model construction approach has been proposed: at the initial iteration, the behavior of the system is described taking into account the failures and recovery actions of only one selected component, assuming that the others are in an operational nominal state. Dependencies between components are taken into account progressively at the following iterations. At each iteration a new component is added and the GSPN model is updated by taking into account the impact of the additional assumptions on the behavior of the components that have been already included in the model.

2.2.2 Decomposition and aggregation approaches The modeling approaches discussed in this section belong to largeness avoidance techniques that try to circumvent the generation of large models using model decomposition and aggregation of the partial results. The basic idea is to decompose the overall model into simpler and more tractable sub-models, and the measures obtained from the solution of the sub-models are then aggregated to compute, usually by approximation, those concerning the overall model. A decomposition and aggregation theory for steady state analysis of general continuous time Markov Chains has been proposed in [Courtois 1977]. The quality of the approximation is related to the degree of coupling between the blocks into which the Markov chain matrix is decomposed. In [Bobbio 1986] the authors present an extension of this technique specifically addressed to the transient analysis of large stiff Markov chains, where stiffness is caused by the simultaneous presence of “fast” and “slow” rates in the transition rate matrix. Time-scale based decomposition approaches have been applied to Non-Markovian stochastic systems in [Haddad 2004], and to GSPN models of systems containing activities whose durations differ by several orders of magnitude in [Ammar 1989]. For example, in [Ammar 1989] the GSPN model is decomposed into a hierarchical sequence of aggregated sub-nets each of which is characterized by a certain time scale. Then these smaller sub-nets are solved in isolation, and their solutions are combined to get the solution of the whole system. The aggregation at each level is done by assuming that the transitions included in the lower level are immediate transitions. At each level of the hierarchy, the current marking of an aggregated sub-net determines the number of tokens in the sub-net at the lower level, which are then analyzed to determine the rate of transitions in the aggregated sub-net. The decomposition approach in [Daly 2001] is based on a new set of connection formalisms that reduce the state-space size and solution time by identifying submodels that are not affected by the rest of the model, and solving them separately. The result from each solved submodel is then used in the solution of the rest of the model. The authors develop four abstractions that can be used to make connection models, and they involve passing a continuous-time random process, a discrete-time random process, a random variable, and an average value between the models. When these abstractions are applied, each submodel should have a smaller state space and fewer time scales than the complete model.



The decomposition technique is also relevant for the modeling of Phased Mission Systems (PMS), i.e., systems characterized by a sequence of phases in which the system configuration can change during operations. The existence of phases is a consequence of: i) diverse tasks to be performed, and ii) diverse environmental conditions, in different periods of system lifetime. In the literature, several approaches have been proposed for the analytical dependability modeling of PMS, all based on a hierarchical structure of the models [Alam 1986, Dugan 1991, Somani 1994]. In [Mura 1999], the model of a PMS is seen as composed of two logically separate Petri nets: the System Net, representing the system (its components, their interactions and their failure/repair behavior) as a GSPN, and the Phase Net, a deterministic and Stochastic Petri Net, representing the control part and describing the phase changes. In the System Net, a single model is built for the whole mission, characterized by a set of phases without detailing the behavior of the system inside each phase. This allows easy modeling of a variety of mission scenarios by sequencing the phases in appropriate ways. The parameter values to be used in the System Net model are obtained by solving the Phase Net models. This approach has been generalized in [Mura 2001] based on Markov Regenerative Stochastic Petri Nets. The key point is that the state space of the Markov regenerative process is never generated and handled as a whole, but rather the various subordinate intra-phase processes are separately generated and solved. As a consequence, the computational complexity of the analytical solution is reduced to the one needed for the separate solution of the different phases, as demonstrated in [Bondavalli 2006]. Other approaches are based on layered and multi-level modeling methods, where the modeled system is structured into different levels corresponding to different abstraction layers, with a model associated to each level. Different techniques based on this idea have been developed; e.g., see [Donatelli 1996, Bondavalli 2001, Kaâniche 2003]. In [Lollini 2005], for example, this modeling methodology is applied to the class of control and resource management systems. To cope with their increasing complexity, such systems are typically developed in a hierarchical fashion: the functionalities of the whole system are partitioned among a number of subsystems working at different levels of a hierarchy. At each level, a subsystem has knowledge and control of the portion of system under its control (lower levels), while it acts just as an actuator with respect to the higher-level subsystems. To improve dependability, fault tolerance measures may be taken at each level, typically introducing interface checks to cope with erroneous inputs and/or outputs and internal checks to cope with faults during the internal computation. The characteristic of such architecture, which is actually common in LCCIs layout, have been exploited, deriving a modeling methodology that is not only directed to build models in a compositional way, but it also includes some capabilities to reduce their solution complexity. In [Lollini 2009], the authors proposed a decomposition and aggregation approach that operates at the system-level, rather than the model-level. Using this approach, entities (or sub-systems) are created that can work in isolation or can interact with each other through a set of dependency relations. The relations state how the behavior of each entity affects the others. The structure, together with the notion of a phased mission, allows one to solve each submodel in isolation, and then pass results between



submodels as needed. Such formulation is not domain-specific and it reduces the complexity of solving models that can be expressed in this framework. This generic decomposition/aggregation approach has been applied to study a GPRS mobile telephone infrastructure that takes into account the congestion due to service outages and its subsequent impact on user-perceived quality of service.

2.3 Dependability modeling and solution tools Several software tools developed over the last thirty years address dependability and performability modeling and evaluation. Surveys of the problems related to techniques and tools for dependability and performance evaluation can be found for example in [Haverkort 1996, Reibman 1991, Sanders 1999]. Tools can be grouped in two main classes:

• Single-formalism/multi-solution tools, which are built around a single formalism and one or more solution techniques. They are very useful inside a specific domain, but their major limitation is that all parts of a model must be built in the single formalism supported by the tool. In the following we cite two sets of tools. The first set of tools is based on Stochastic Petri Nets formalism and its extensions. They all provide analytic/numerical solution of a generated state-level representation and, in some cases, support simulation- based solution as well. This set includes DSPNexpress [Lindemann 1999], GreatSPN [Chiola 1995], SURF-2 [Béounes 1993], DEEM [Bondavalli 2004], TimeNET [German 1995], UltraSAN [Sanders 1995].

• Multi-formalism/multi-solution tools, which support multiple modeling formalisms, multiple model solution methods, and several ways to combine the models, which can be expressed using different formalisms. To a certain extent, tools within this category provide support for the decomposition/aggregation approaches mentioned above, and they can be distinguished with respect to the level of integration between formalisms and solution methods they provide. Some tools try to unify several different single-formalism modeling tools into a unique software environment, providing a unified interface for the specification of models and for the reporting of results. Some examples in this category are IMSE [Pooley 1991], IDEAS [Fricks 1997], FREUD [van Moorsel 1998], DRAWNET++ [Vittorini 2002]. Other tools are developed as support for more comprehensive approaches, where new modeling methodologies, formalisms, and composition operators are defined to allow the integration of multiple formalisms within a unique comprehensive tool. Though more difficult than building a software environment out of existing tools, this approach has the potential to much more closely integrate models expressed in different modeling formalisms. The main tools following this approach are SHARPE [Trivedi 2002], SMART [Ciardo 1996], DEDS [Bause 1998], POEMS [Adve 2000] and MÖBIUS [Courtney 2009].

Based on the kind of model, the type of measure that should be evaluated and the software tool, the model solution is obtained as a computation of a measure by using one of the following classes of techniques:

• Closed form results, which yield exact measures but can be obtained for only a limited class of models.



• Direct analytical techniques, like matrix inversion, which still yield exact measures but can be obtained for only a limited class of models.

• Iterative numerical techniques (to note that no general guarantees of convergence of iterative methods do exist for some problems, and the determination of a suitable error bound for termination is not easy).

• Simulation techniques, which provide an estimate with a confidence interval for the result, but may be costly in terms of run time.

2.4 Deriving dependability models from engineering models The emergence of model-based engineering methodologies and the elaboration of automated model transformation techniques have opened new possibilities in the modeling of large-scale complex systems and infrastructures. Model-based engineering refers to the systematic use of models as primary artefacts throughout the engineering lifecycle [Schmidt 2006]. Precise, albeit informal or semi-formal engineering languages (like UML, BPEL, AADL, etc.) are used to provide a high-level model of the system; model transformation techniques are then used to automatically generate several “artefacts”: source code, translation in other modeling languages, analysis models, etc. These initiatives and technologies influenced model-based assessment as well, since they offered an efficient and integrated approach to derive dependability analysis models from engineering models. Dependability modeling and evaluation requires specific support for the specification and description non-functional aspects of the system (like reliability, safety), which are not properly covered by the common engineering languages (as these focus primarily on functional aspects). Recently, significant effort has been spent in the definition of standard languages that support the high-level specification of non-functional properties of systems, e.g., the UML profile for QoS and fault tolerance [OMG 2008], or the Error Model Annex for AADL [SAE 2006]. However, there are not comprehensive high-level languages that support MDE dependability evaluation yet, and properly addressing dependability concerns in this context is still a challenge (e.g., see [Montecchi 2011a] for further details). Different approaches for the automated derivation of dependability models have appeared in literature, often using ad-hoc language extensions:

• Direct modeling of dependability related behavior: System designers use the extended engineering language to directly describe failure and repair/recovery processes (e.g., occurrence of different failure modes, error propagation) and also the corresponding properties of components (e.g., error rates, propagation probabilities). A good example is the usage of the AADL Error Model Annex: the behavior of the components can be described in presence of internal faults and repair events, as well as in presence of external propagations. The dependability evaluation toolset constructs the analysis models by mapping the dependability related behavior to the analysis formalism and then computes system-level dependability measures. A stepwise approach for GSPN dependability modeling on the basis of AADL is presented in [Rugina 2007]. As another example, in [Ganesh 2002] UML is used as a language to describe error propagation and module substitution, which is then mapped to dynamic fault trees.



• Modular construction of system-level models using predefined generic sub-models: Dependability experts construct analysis sub-models that represent the generic structure of both the failure/recovery processes of the different types of components and the error propagation among them. System designers use the language extensions just to identify the component types and assign local dependability parameters to hardware and software artefacts in the engineering model. These dependability parameters (typically available from component handbooks or from component level evaluation) are used to parameterize the generic sub-models. The dependability model construction tools (1) apply pattern matching and model transformation to assemble the relevant parameterized sub-models in a modular way on the basis of the architecture design, and then (2) invoke solution algorithms to solve the system level model. In a UML based approach [Bondavalli 2001b], language extensions are defined as a UML profile (stereotypes and tagged values), analysis sub-models are assigned to architectural components and relations, and then composed as a system level Stochastic Reward Net (SRN). Modular model construction is supported by automated tools [Magyar 2009]. In case of web service based process models [Gönczy 2006], web service language extensions are utilized; the services are mapped to DSPN sub-models, and integrated in a Multiple Phased System model. A Model Driven Engineering (MDE) transformation workflow for the quantitative evaluation of dependability-related metrics has been recently presented in [Montecchi 2011b]. The approach is integrated in a more comprehensive modeling framework that is currently developed within the CHESS project, which combines MDE philosophy with component-based development techniques.

• Integration of various aspects from different models: In complex, dynamic distributed systems the dependability model shall be constructed from several engineering models that capture various aspects of the system at different hierarchy levels. Typically user, application, architecture and network levels are distinguished. For example, in case of large, critical mobile systems and infrastructures [Bondavalli 2011], the construction of the dependability model for computing user-level dependability attributes is based on (1) the workflow model of the user activities, (2) the topology models of the network connections in the various phases of the user activities (also constructed automatically from user mobility traces), and (3) the application-service-resource dependency models. No wonder that a complex evaluation tool-chain is required in order to integrate the different mappings, abstractions, and model transformation steps [Kovács 2008].

The automated derivation of dependability analysis models from the engineering models (that were created during the model based development process) has the advantage that – besides the application of certain model extensions – there is no need to learn and use specific dependability analysis formalisms, and modeling efforts can be saved. The goal is to relieve engineers from focusing on modeling details by enabling the automated and transparent generation of parametric failure and performance models. This is definitely a benefit if dependability analysis necessitates the creation of



state-based dependability models in complex systems, as these models require higher learning and modeling effort than traditional combinatorial methods. As an example, the framework proposed in [Cinque 2007] proposes a workflow for automating the creation of dependability models for assessing the dependability of Wireless Sensor Networks (WSNs) [Akyldiz 2002], which is based on the use of behavioral simulator and Stochastic Activity Networks. The first is used to specify and configure the target system and to study its fault-free behavior. Results are used to generate and populate analytical models with realistic values for their parameters. To this aim, the framework is equipped with a library of parametric model templates, i.e., model skeletons that can be specialized automatically, depending on the specific system to engineer. Model templates are defined una-tantum by a domain expert, and system engineers do not need to be aware of them. No wonder that parameters are dynamic over times, and they need to be recomputed upon failures and recovery events (e.g., a node fails and the topology is modified, forcing changes in the traffic patterns and hence in node power consumption figures). The interaction between the behavioral simulator and the SAN models is managed by the framework, which is in charge of re-computing model parameters that change over time.



3. EXPERIMENTAL MEASUREMENTS TECHNIQUES Experimental measurements represent a key approach for a quantitative evaluation of Large Complex Critical Infrastructures (LCCIs). They can provide real data from a class of systems that can be difficult to model or to simulate due to their high complexity. In this chapter we fist address the dependability and security benchmarking approaches (Section 3.1), which aim at providing generic, repeatable and widely accepted methods for characterizing and quantifying the system (or component) behavior in the presence of faults, and comparing the dependability and security of alternative solutions. As the collected data are highly heterogeneous and often noisy, then we present some existing solutions for filtering (Section 3.2), and some algorithms for anomaly detection (Section 3.3). Then the focus moves on the evaluation methods for intrusion detection systems (Section 3.4), and on the Field Failure Data Analysis (FFDA) methodology for the dependability evaluation of computer systems (Section 3.5), which allows to collect, to manipulate, and to analyze failure data. In the last part of the chapter we first discuss the different strategies proposed in literature for the on-line monitoring of the health of a system (Section 3.6), and finally we provide an overview on fault injection techniques to evaluate (and possibly increase) the robustness and behavior in the presence of faults of computer-based systems (Section 3.7).

3.1 Dependability and Security Benchmarking of SCADA based LCCIs LCCIs, such as transport infrastructures and power grids (e.g., network of seaports and airports), play a key role into several fundamental human activities, and represent the next generation of Monitor and Control Systems. They make extensive usage of Information and Communications Technology (e.g., communication networks, computing systems, and sensing hardware) for providing support for advanced control and monitoring facilities [Bologna 2003]. LCCIs are characterized by means of federating several heterogeneous systems via a certain middleware platform. This represents a novel perspective on how next generation Supervisory Control And Data Acquisition (SCADA) systems are realized. The function of a SCADA system is to monitor, operate and control remote systems that are located over a large geographic area from a central location. Remote monitoring and control provide data that can be used to significantly enhance operation efficiency, reduce downtime, increase security and face terrorism treats. The operations and processes being monitored and controlled can be industrial, infrastructure or facility based. A SCADA system is typically composed of a central core, also called Central Room, where system information acquisition and control are concentrated, and a number of RTUs (Remote Terminal Units), equipped with limited computational resources. RTUs communicate with the Central Room by sending to and receiving from it real-time short control messages. At the Supervisory Station, human supervision is performed thanks to human operators that monitor the system behavior and issue commands through a Human-Machine Interface (HMI), where the system status is represented in a graphical, intuitive format, usually via mimic diagrams which also support commanding. SCADA can be further controlled in a centralized way by hierarchically higher control levels. The goal of benchmarking the dependability of SCADA systems is to provide generic and reproducible ways for characterizing their behavior in the presence of faults. A



benchmark represents an agreement that is widely accepted both by the computer industry and/or by the user community. This technical agreement should state the system that is benchmarked, the measures, the way and conditions under which these measures are obtained, and their domain of validity. The dependability benchmarks can be used to characterize the dependability of a component or a system as well as compare alternative or competitive solutions according to one or several dependability attributes. The dependability benchmark results could be used to identify weak parts of a system, requiring more attention and perhaps needing some improvements by tuning a component to enhance its dependability, or by tuning the system architecture to ensure a suitable dependability level. Different approach can be used to characterize the dependability of a SCADA component/system, which are discussed in the following. DBench [DBench 2002] has developed a framework for defining dependability benchmarks for computer systems, with emphasis on OTS-based systems, via experimentation and modeling. DBench objective is to provide a framework and guidelines for defining dependability benchmarks for computer systems, and provide means for implementing them. The idea of benchmarking a product versus benchmarking the process of creating a product is a key first dimension of the problem, and the focus on product benchmarking represents a key decision for Dbench. A dependability benchmark can be represented by a well-defined set of dependability measures and a detailed specification of all the procedures, methods, tools, and steps required to obtain these measures. The measure dimension defines what is expected from a dependability benchmark. Four groups of measure dimension can be considered:

• Categorization dimensions: These dimensions characterize the dependability benchmarks and define a set of different benchmark categories. The categorization dimensions include: o Benchmark usage – it is a composite dimension, which identifies the different

perspectives for running dependability benchmarks and using the benchmark results;

o Life cycle phase - identifies the phase in the product life that will be addressed in the benchmark;

o Application area – identifies the application area defined with the adequate granularity;

o Target system - defines the target system and/or the target component that will be subject to measures.

• Measure dimension: The selection of dependability benchmarking measure to be assessed depends on the choices made for categorization dimensions.

• Experiment dimensions: The experiment dimensions are:

o Operating environment - typical environment for an application area and the way it affects the benchmark;



o Workload - defines a working profile that should be representative of an application area;

o Faultload - defines the set of upsets, stressful conditions and faults that could affect the system;

o Procedures and rules - defines the procedures and the rules to perform the benchmarking.

• Property dimensions: dependability benchmarks must meet some properties to be valid and useful. For example, a benchmark must be repeatable, must not cause undesirable interferences with the target system, must be portable, cost effective, etc.

The benchmark category is itself related to the various categorization dimensions, i.e., the benchmark usage, the life cycle phase, the application area and the target system nature. Different possibilities for these categorization dimensions result in several categories of dependability benchmark. As a consequence, the first step in conducting a benchmark consists in analyzing the system and the various categorization dimensions to determine the benchmark category and measures of interest. The experimental dimensions (i.e., the operating environment, the workload, the fault-load and the procedures and rules) are defined as a result of the definition of the benchmark category and features/measures of interest. If, according to the various categorization dimensions, comprehensive measures are of interest, a modeling step may be required in addition the experimentation (see also Section 4.2 for the relationships between modeling and experimentation). It is worth noting that, depending on the target system and on the measures of interest, the analysis step could be straightforward and may only consist in selecting the right benchmark category and measures as well as the various workload and fault-load among the existing ones. However, for some target systems the analysis may be more elaborate. It may even consist in analyzing deeply the system behavior and even to prepare the modeling phase. In this latter case, the analysis may be time-consuming and may require a significant effort. Nevertheless, the results may be more complete than when only specific measures are considered. A trade-off may be made according to the objectives of the benchmarks, the effort required, and the results expected. [Kanoun 2005] propose a Black-Box approach to dependability benchmarking of computing systems. In particular, two dependability benchmarks are discussed: a benchmark for database centric systems and another one for web server based systems. [Thomas 2008] propose benchmarking security scheme for software system. Since software verification is undecidable in general, authors propose a tool able to prove some subset of the assertions safe. They refer to this partial success as partial verification. They claim that it is feasible to gradually rewrite programs so that more assertions can be proven safe, meaning that there are fewer places in which the property may be violated, and thus fewer potential vulnerabilities. Finally, they give metrics to measure this progress. [Lie 2007] propose a framework that if applied to security systems, would produce



quantitative measures that can be used to compare and appraise security systems relative to each other.

3.2 Data filtering and analysis Monitoring of SCADA-based LCCIs generates large amounts of data that can be used to characterize the behavior of system components, to detect anomalies and attacks, and to support management decisions. Unfortunately, previous statistical analyses of the collected resource measures evidence that collected data typically exhibit highly variable and non-stationary patterns that can even be affected by perturbations and noises. Such a stochastic behavior prevents algorithms and decision support systems to operate on raw data generated by the LCCI monitoring systems. When these variable and noisy conditions occur, we have to treat data series through some statistical filter that is able to remove out-of-scale values and perturbations, and to extract a filtered representation that can be passed to some detection and decision algorithm. Filtering models can be applied to on-line or to off-line algorithms. On-line models work on a limited set of data and their maximum computational complexity is limited by the time constraints imposed by the application context. On the other hand, off-line models can use any set of past data and do not have to comply with time constraints. SCADA-based LCCIs may refer to both on-line and off-line filtering models depending on the application context and goal. A very large literature on data filtering model exists. In Figure 4 we report a classification of the main statistical methods that can be used to obtain a filtered data representation SCADA-based LCCIs. The first main difference is between:

• Interpolation models, described in Section 3.2.1; • Smoothing models, described in Section 3.2.2.

3.2.1 Interpolation techniques Interpolation is a method of constructing new data points from a discrete subset of known data points {x1, ..., xp}. An interpolation function must pass through the selected points of the observed data set. The interpolation methods can be classified in other two main classes: linear interpolation and non-linear interpolation models.



Figure 4. Data filtering techniques classification Linear Interpolation. Linear interpolation is a method of curve fitting using linear polynomials. Typical examples of linear interpolations are the piecewise constant interpolation and the simple regression models. The simplest interpolation method is the piecewise constant interpolation that assigns to each point of data set the value of the nearest point belonging to the subset of known data points {x1 ,..., xp

} [Fu 1982]. Figure 5 shows an example of piecewise constant interpolation based on a set of 7 known points that is, p=7. The horizontal black lines passing through the known data points compose the estimation of the data filter resulting from the application of piecewise constant interpolation technique. In one dimension, there are seldom-good reasons to choose this simple method over data filtering. However, in higher dimensional multivariate interpolation, this model can be a favorable choice appreciated for its speed and simplicity.

Figure 5. Piecewise constant interpolation



An example of simple regression interpolation is given in Figure 6. The black line estimates the data filtering as straight continuous lines linking all the known data set points {x1 ,..., xp}. Linear interpolation is quick and easy, but it is not precise [Fu 1982]. All linear interpolation techniques have low computational costs and can provide acceptable results when the data set is subject to linear trends [Harrell 2001]. On the other hand, when the data set is characterized by non-stationary and high variable behavior, the linear interpolation may be unreliable especially for data filtering and trend identification. In this context, non-linear interpolation gives better results.

Figure 6. Simple regression interpolation

Non-linear interpolation Non-linear interpolation is considered a data filtering technique able to model highly curved time series through non-linear polynomials. The linear models fit a straight line or a flat plane to the data samples. Usually, the true relationship that we want to model is curved rather than flat. To filter the data set and to fit it, we need non-linear models such as polynomial and spline interpolations. Given some points of the data set, polynomial interpolation techniques filter the data set and estimate its trend through polynomials of degree higher than 1 passing through the known data set points {x1 ,..., xp

}. Referring to the previous example, the sixth degree polynomial in Figure 7 goes through all the seven points. Generally, if we have p known data points, there is exactly one polynomial of degree at most p-1 passing through all the data points. The interpolation error is proportional to the distance between the known data points [Birkhoff 1965]. The polynomial interpolation solves all the problems of simple regression. However, even polynomial interpolation has some disadvantages. Calculating the interpolating polynomial is computationally expensive compared to simple regression. Furthermore, polynomial interpolation may exhibit oscillatory artifacts, especially at the end points.



Figure 7. Polynomial interpolation

These disadvantages can be avoided through the spline interpolation model [Poirier 1973, Wolberg 1999], that uses low-degree polynomials in each of the intervals between two consecutive points in {x1 ,..., xp}, and chooses the polynomial pieces such that they fit smoothly together. The resulting function is called a spline. Figure 8 shows the result of a cubic spline, where the polynomial pieces are of degree 3.

Figure 8. Spline Interpolation

Similarly to polynomial interpolation, spline interpolation incurs a smaller error than that of linear interpolation. Moreover, the spline is easier to evaluate than the high-degree polynomials used in polynomial interpolation. For these reasons the spline interpolation is commonly used in data filtering and trend extraction contexts [Wolberg 1999, Poirier 1973, Eubank 1999]. Despite that, all non-linear techniques have high computational costs and are often inadequate to work in contexts with short-term real-time requirements.

3.2.2 Smoothing techniques A smoothing technique is a function that aims to capture important patterns in the data set, while leaving out noises [Winer 1964]. Some common smoothing algorithms are the moving average, the autoregressive models and the filtering theory.



Moving average Moving average techniques smooth out the observed data set and reduce the effect of out-of-scale values. They are fairly easy to compute at runtime and are commonly used as trend indicators [Lilja 2000, Dinda 2000]. The most used moving average techniques are the Simple Moving Average (SMA) and the Exponential Weighted Moving Average (EWMA), that compute a uniform and a non-uniform weighted mean of the past measures, respectively. As the Simple Moving Average assigns an equal weight to each of the past-considered data values, this model tends to introduce a significant delay in time series filtering, especially when the number of the past considered data increases. Exponential Moving Average models are usually applied with the purpose of limiting this delay effect. However, these techniques tend to introduce an excessive delay in data filtered representation when the number of past measures is large, while they do not eliminate all noises when working on a small set of past samples. The issue of choosing the best past data set size can be addressed when the time series are stable [Andreolini 2008]. Autoregressive Autoregressive models include a group of linear smoothing formulas that aim to filter a time series on the basis of the previous raw and filtered samples. A model that depends only on previous filtered samples is called an Auto-Regressive} (AR) model, while a model depending only on raw data samples is called a Moving Average (MA) model. A model based on both raw and filtered samples is called Auto-Regressive Moving Average (ARMA). These autoregressive models are adequate for stationary time series [Dinda 2000]. On the other hand, when the data set shows evidence of non-stationarity, it is preferable to use the Auto-Regressive Integrated Moving Average (ARIMA) model that is a generalization of the ARMA model [Tran 2004]. ARIMA is characterized by a different initial step corresponding to the “integrated” part of the model, which is applied to remove the non-stationarity of the time series. The ARIMA model has the advantage that few terms are needed to describe a wide variety of time series processes, less than AR and MA models [Spreen 1979]. ARFIMA and ARCH [Engle 1995, Ling 1997] are further accurate autoregressive techniques that may be used to model with long memory or exhibiting time-varying volatility clustering, that is, periods of swings followed by periods of relative calm. Filtering theory Filtering theory is useful to reveal trends in time series. Its purpose is to remove from a signal some unwanted component or feature, such as the noise component. Some examples of models based on filtering theory that can be adopted in SCADA-based systems are the Recursive filtering, the Discrete Wavelet Transforms and the Discrete Fourier Transform. The recursive filters re-use one or more of their outputs as an input. If the time series and the component error are Gaussian and uncorrelated, there is an optimal recursive filter, namely Kalman Filter. It is a set of mathematical equations that provides an efficient computational (recursive) means to estimate the state of a process in a way that minimizes the mean of the squared error. This filter is very popular and powerful



in several aspects: it supports estimations of past, present, and even future states, even when the nature of the modeled time series is unknown [Bishop 2001]. Discrete Wavelet Transforms (DWT) and Discrete Fourier Transforms (DFT) are more representative techniques based on the filtering theory. These techniques belong to a computationally efficient family of multi-scale basis functions for the decomposition of a signal into levels or scales and for the extraction of a de-noised data set representation [Percival 2000]. In the DWT, the data set is passed through filters with different cut-off frequencies at different levels, while the DFT decomposes the time series into the sum of periodic harmonics. The main difference is that Wavelets are localized in both time and frequency, while the standard Fourier transform is only localized in frequency. Wavelets often give a better data set's trend representation and are computationally more efficient than the Discrete Fourier Transform [Graps 1995]. Figure 9, Figure 10 and Figure 11 show the results of applying smoothing techniques to a stochastic and highly variable time series. An example of data filtering and trend estimation is given for each category of smoothing techniques. The black line in Figure 9 represents the trend resulting from the Exponential Weighted Moving Average computed on the original time series represented by the grey line. In this example, the EWMA model works by considering ten past measures. It produces a spiky and reactive representation of the data set, following the variability of the time series. Similar results are achieved by the ARIMA(1,1,1) model in Figure 10. This autoregressive technique tracks the data set and smoothes out only the major fringes of variability, thus resulting in a fluctuating representation strongly dependent on the data samples values. On the other hand, the DWT technique in Figure 11 cuts out almost all time series variability, thus resulting in the smoothest representation. This filtering technique represents well the overall trend of the data set and removes almost all the variability of the time series. It is unreasonable to define which smoothing technique better filters the data and estimates a trend because the performance of each model must be related to the application context and the time series characteristics [Andreolini 2008].

Figure 9. EWMA smoothing



Figure 10. ARIMA smoothing

Figure 11. DWT smoothing

3.3 Anomaly detection Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior. These nonconforming patterns are often referred to as anomalies, outliers, discordant observations, exceptions, aberrations, surprises, peculiarities, or contaminants in different application domains. Of these, anomalies and outliers are two terms used most commonly in the context of anomaly detection; sometimes interchangeably. The importance of anomaly detection is due to the fact that anomalies in data translate to significant, and often critical, actionable information in a wide variety of application domains. Over time, a variety of anomaly detection techniques have been developed in several research communities. Many of these techniques have been specifically developed for certain application domains, while others are more generic. Anomalies are patterns in data that do not conform to a well-defined notion of normal behavior. They might be induced in the data for a variety of reasons, such as malicious activity, for example, cyber-intrusion, terrorist activity or breakdown of a system, but all of the reasons have the common characteristic that they are interesting to the analyst. The interestingness or real life relevance of anomalies is a key feature of anomaly detection. Anomaly detection is related to, but distinct from noise removal [Teng 1990] and noise accommodation [Rousseeuw 1987], both of which deal with unwanted noise in the data. Noise can be defined as a phenomenon in data that is not of interest to the analyst, but



acts as a hindrance to data analysis. However, it should be noted that solutions for these related problems are often used for anomaly detection and vice versa, and hence are discussed in this section as well. At an abstract level, an anomaly is defined as a pattern that does not conform to expected normal behavior. A straightforward anomaly detection approach, therefore, is to define a region representing normal behavior and declare any observation in the data that does not belong to this normal region as an anomaly. But several factors make this apparently simple approach very challenging:

• Defining a normal region that encompasses every possible normal behavior is very difficult. In addition, the boundary between normal and anomalous behavior is often not precise.

• When anomalies are the result of malicious actions, the malicious adversaries often adapt themselves to make the anomalous observations appear normal, thereby making the task of defining normal behavior more difficult.

• In many domains normal behavior keeps evolving and a current notion of normal behavior might not be sufficiently representative in the future.

• The exact notion of an anomaly is different for different application domains. Thus applying a technique developed in one domain to another is not straightforward.

• Availability of labeled data for training/validation of models used by anomaly detection techniques is usually a major issue.

• Often the data contains noise that tends to be similar to the actual anomalies and hence is difficult to distinguish and remove.

Due to these challenges, most of the existing anomaly detection techniques solve a specific formulation of the problem. The formulation is induced by various factors such as the nature of the data, availability of labeled data, type of anomalies to be detected, and so on. Often, the application domain in which the anomalies need to be detected determines these factors. Nature of Input data A key aspect of any anomaly detection technique is the nature of the input data. Input is generally a collection of data instances (also referred as object, record, point, vector, pattern, event, case, sample, observation, or entity) [Tan 2005]. Each data instance can be described using a set of attributes (also referred to as variable, characteristic, feature, field, or dimension). The attributes can be of different types such as binary, categorical, or continuous. Each data instance might consist of only one attribute (univariate) or multiple attributes (multivariate). In the case of multivariate data instances, all attributes might be of same type or might be a mixture of different data types. The nature of attributes determines the applicability of anomaly detection techniques. Type of Anomaly An important aspect of an anomaly detection technique is the nature of the desired



anomaly. Anomalies can be classified into following three categories: • Point Anomalies. If an individual data instance can be considered as anomalous

with respect to the rest of data, then the instance is termed a point anomaly. • Contextual Anomalies. If a data instance is anomalous in a specific context, but

not otherwise, then it is termed a contextual anomaly (also referred to as conditional anomaly [Song 2007]). The notion of a context is induced by the structure in the data set and has to be specified as a part of the problem formulation. Each data instance is defined using the following two sets of attributes:

o Contextual attributes. The contextual attributes are used to determine the context (or neighborhood) for that instance. For example, in spatial data sets, the longitude and latitude of a location are the contextual attributes.

o Behavioral attributes. The behavioral attributes define the non-contextual characteristics of an instance. For example, in a spatial data set describing the average rainfall of the entire world, the amount of rainfall at any location is a behavioral attribute.

A data instance might be a contextual anomaly in a given context, but an identical data instance (in terms of behavioral attributes) could be considered normal in a different context. • Collective Anomalies. If a collection of related data instances is anomalous with

respect to the entire data set, it is termed a collective anomaly. The individual data instances in a collective anomaly may not be anomalies by themselves, but their occurrence together as a collection is anomalous.

It should be noted that while point anomalies can occur in any data set, collective anomalies can occur only in data sets in which data instances are related. In contrast, occurrence of contextual anomalies depends on the availability of context attributes in the data. A point anomaly or a collective anomaly can also be a contextual anomaly if analyzed with respect to a context. Thus a point anomaly detection problem or collective anomaly detection problem can be transformed to a contextual anomaly detection problem by incorporating the context information. Data Labels The labels associated with a data instance denote whether that instance is normal or anomalous. It should be noted that obtaining labeled data that is accurate as well as representative of all types of behaviors, is often prohibitively expensive. Typically, getting a labeled set of anomalous data instances that covers all possible type of anomalous behavior is more difficult than getting labels for normal behavior. Based on the extent to which the labels are available, anomaly detection techniques can operate in one of the following three modes:

• Supervised Anomaly Detection. Techniques trained in supervised mode assume the availability of a training data set that has labeled instances for normal as well as anomaly classes. A typical approach in such cases is to build a predictive model for normal vs. anomaly classes. Any unseen data instance is compared against the model to determine which class it belongs to. There are two major



issues that arise in supervised anomaly detection. First, the anomalous instances are far fewer compared to the normal instances in the training data. Second, obtaining accurate and representative labels, especially for the anomaly class is usually challenging. Other than these two issues, the supervised anomaly detection problem is similar to building predictive models.

• Semisupervised Anomaly Detection. Techniques that operate in a semisupervised mode, assume that the training data has labeled instances only for the normal class. Since they do not require labels for the anomaly class, they are more widely applicable than supervised techniques. The typical approach used in such techniques is to build a model for the class corresponding to normal behavior, and use the model to identify anomalies in the test data.

• Unsupervised Anomaly Detection. Techniques that operate in unsupervised mode do not require training data, and thus are most widely applicable. The techniques in this category make the implicit assumption that normal instances are far more frequent than anomalies in the test data. If this assumption is not true then such techniques suffer from high false alarm rate.

3.3.1 Classification-based Anomaly Detection Techniques Classification [Tan 2005, Duda 2000] is used to learn a model (classifier) from a set of labeled data instances (training) and then, classify a test instance into one of the classes using the learned model (testing). Classification-based anomaly detection techniques operate in a similar two-phase fashion. The training phase learns a classifier using the available labeled training data. The testing phase classifies a test instance as normal or anomalous, using the classifier. Classification based anomaly detection techniques operate under the following general assumption: a classifier that can distinguish between normal and anomalous classes can be learned in the given feature space. Based on the labels available for the training phase, classification-based anomaly detection techniques can be grouped into two broad categories: multi-class and oneclass anomaly detection techniques. Multi-class classification based anomaly detection techniques assume that the training data contains labeled instances belonging to multiple normal classes [Stefano 2000, Barbara 2001]. Such anomaly detection techniques teach a classifier to distinguish between each normal class and the rest of the classes. A test instance is considered anomalous if it is not classified as normal by any of the classifiers. Some techniques in this subcategory associate a confidence score with the prediction made by the classifier. If none of the classifiers are confident in classifying the test instance as normal, the instance is declared to be anomalous. One-class classification based anomaly detection techniques assume that all training instances have only one class label. Such techniques learn a discriminative boundary around the normal instances using a one-class classification algorithm, for example, one-class SVMs [Schӧlkopf 2001], one-class Kernel Fisher Discriminants [Roth 2004, Roth 2006]. Any test instance that does not fall within the learned boundary is declared as anomalous. There are a variety of anomaly detection techniques that use different classification algorithms to build classifiers:

• Neural networks have been applied to anomaly detection in multi-class as well as one-class settings. A basic multi-class anomaly detection technique using neural networks operates in two steps. First, a neural network is trained on the



normal training data to learn the different normal classes. Second, each test instance is provided as an input to the neural network. If the network accepts the test input, it is normal and if the network rejects a test input, it is an anomaly [Stefano 2000, Odin 2000].

• Bayesian networks have been used for anomaly detection in the multi-class setting. A basic technique for a univariate categorical data set using a naïve Bayesian network estimates the posterior probability of observing a class label from a set of normal class labels and the anomaly class label, given a test data instance. The class label with largest posterior is chosen as the predicted class for the given test instance. The likelihood of observing the test instance given a class and the prior on the class probabilities, is estimated from the training data set. The zero probabilities, especially for the anomaly class, are smoothed using Laplace Smoothing. The basic technique can be generalized to multivariate categorical data sets by aggregating the per-attribute posterior probabilities for each test instance and using the aggregated value to assign a class label to the test instance.

• Support Vector Machines (SVMs) [Vapnik 1995] have been applied to anomaly detection in the one-class setting. Such techniques use one class learning techniques for SVM [Ratsch 2002] and learn a region that contains the training data instances (a boundary). Kernels, such as radial basis function (RBF) kernel, can be used to learn complex regions. For each test instance, the basic technique determines if the test instance falls within the learned region. If a test instance falls within the learned region, it is declared as normal, else it is declared as anomalous.

• Rule-based anomaly detection techniques learn rules that capture the normal behavior of a system. A test instance that is not covered by any such rule is considered as an anomaly. Rule-based techniques have been applied in multi-class as well as one-class settings. A basic multi-class rule-based technique consists of two steps. The first step is to learn rules from the training data using a rule learning algorithm, such as RIPPER, Decision Trees, and so on. Each rule has an associated confidence value that is proportional to ratio between the number of training instances correctly classified by the rule and the total number of training instances covered by the rule. The second step is to find, for each test instance, the rule that best captures the test instance. The inverse of the confidence associated with the best rule is the anomaly score of the test instance.

3.3.2 Nearest Neighbor-based Anomaly Detection The concept of nearest neighbor analysis has been used in several anomaly detection techniques. Such techniques are based on the following key assumption: normal data instances occur in dense neighborhoods, while anomalies occur far from their closest neighbors. Nearest neighbor-based anomaly detection techniques require a distance or similarity measure defined between two data instances. For continuous attributes, Euclidean distance is a popular choice, but other measures can be used [Tan 2005, Chapter 2]. For categorical attributes, a simple matching coefficient is often used but more complex



distance measures can also be used [Boriah 2008, Chandola 2008]. For multivariate data instances, distance or similarity is usually computed for each attribute and then combined [Tan 2005]. The measures are typically required to be positive-definite and symmetric, but they are not required to satisfy the triangle inequality. Nearest neighbor-based anomaly detection techniques can be broadly grouped into two categories:

(1) Techniques that use the distance of a data instance to its k-th nearest neighbor as the anomaly score;

(2) Techniques that compute the relative density of each data instance to compute its anomaly score.

Additionally there are some techniques that use the distance between data instances in a different manner to detect anomalies. Using Distance to kth

A basic nearest neighbor anomaly detection technique is based on the following definition: The anomaly score of a data instance is defined as its distance to its k

Nearest Neighbor

th

Researchers have extended the basic technique in three different ways.

nearest neighbor in a given data set.

• The first set of variants modifies the definition to obtain the anomaly score of a data instance.

• The second set of variants uses different distance/similarity measures to handle different data types.

• The third set of variants focuses on improving the efficiency of the basic technique (the complexity of the basic technique is O(N2

[Eskin 2002], [Angiulli 2002] and [Zhang 2006] compute the anomaly score of a data instance as the sum of its distances from its k nearest neighbors. A different way to compute the anomaly score of a data instance is to count the number of nearest neighbors (n) that are not more than d distance apart from the given data instance [Knorr 1997, Knorr 1998, Knorr 1999, Knorr 2000]. This method can also be viewed as estimating the global density for each data instance, since it involves counting the number of neighbors in a hypersphere of radius d. While most techniques discussed so far in this category have been proposed to handle continuous attributes, several variants have been proposed to handle other data types. A hypergraph-based technique, called HOT, is proposed by [Wei 2003] in which the authors model the categorical values using a hypergraph, and measure distance between two data instances by analyzing the connectivity of the graph. A distance measure for data containing a mix of categorical and continuous attributes has been proposed for anomaly detection [Otey 2006]. The authors define links between two instances by adding distance for categorical and continuous attributes separately. For categorical attributes, the number of attributes for which the two instances have the same values defines the distance between them. For continuous attributes, a covariance matrix is maintained to capture the dependencies between the continuous values. [Palshikar 2005] adapts the technique proposed in [Knorr 1999a] to continuous sequences. [Kou 2006] extend the

), where N is the data size) in different ways.



technique proposed in [Ramaswamy 2000] to spatial data. Several variants of the basic technique have been proposed to improve the efficiency. Some techniques prune the search space by either ignoring instances that cannot be anomalous or by focusing on instances that are most likely to be anomalous. [Ramaswamy 2000] propose a partition-based technique, which first clusters the instances and computes lower and upper bounds on the distance of an instance from its kth

nearest neighbor for instances in each partition. This information is then used to identify the partitions that cannot possibly contain the top k anomalies; such partitions are pruned. Anomalies are then computed from the remaining instances (belonging to unpruned partitions) in a final phase. Similar cluster-based pruning has been proposed by [Eskin 2002], [McCallum 2000], [Ghoting 2006], and [Tao 2006]. To prune the search space for nearest neighbors, several techniques partition the attribute space into a hypergrid consisting of hypercubes of fixed sizes. The intuition behind such techniques is that if a hypercube contains many instances, such instances are likely to be normal. Moreover, if for a given instance, the hypercube that contains the instance, and its adjoining hypercubes, contain very few instances, the given instance is likely to be anomalous.

Using Relative Density Density-based anomaly detection techniques estimate the density of the neighborhood of each data instance. An instance that lies in a neighborhood with low density is declared to be anomalous while an instance that lies in a dense neighborhood is declared to be normal. For a given data instance, the distance to its kth nearest neighbor is equivalent to the radius of a hypersphere, centered at the given data instance, which contains k other instances. Thus the distance to the kth

To handle the issue of varying densities in the data set, a set of techniques has been proposed to compute the density of instances relative to the density of their neighbors. [Breunig 1999, Breunig 2000] assign an anomaly score to a given data instance, known as Local Outlier Factor (LOF). For any given data instance, the LOF score is equal to ratio of average local density of the k nearest neighbors of the instance and the local density of the data instance itself. To find the local density for a data instance, the authors first find the radius of the smallest hyper-sphere centered at the data instance, which contains its k nearest neighbors. The local density is then computed by dividing k by the volume of this hyper-sphere. For a normal instance lying in a dense region, its local density will be similar to that of its neighbors, while for an anomalous instance, its local density will be lower than that of its nearest neighbors. Hence the anomalous instance will get a higher LOF score.

nearest neighbor for a given data instance can be viewed as an estimate of the inverse of the density of the instance in the data set and the basic nearest neighbor-based technique described in the previous subsection can be considered as a density-based anomaly detection technique. Density-based techniques perform poorly if the data has regions of varying densities.

Several researchers have proposed variants of the LOF technique. Some of these variants estimate the local density of an instance in a different way. Some variants have adapted the original technique to more complex data types. Since the original LOF



technique is O(N2

3.3.3 Clustering-based Anomaly Detection

) (N is the data size), several techniques have been proposed that improve the efficiency of LOF, such as [Tang 2002], [Hautamaki 2004], [Brito 1997], and [Papadimitriou 2002], while several variants of LOF have been proposed to handle different data types, for example [Sun 2004, Sun 2006] for detecting spatial anomalies in climate data, [Yu 2006] use a similarity measure instead of distance to handle categorical attributes and [Pokrajac 2007] extend LOF to work in an incremental fashion to detect anomalies in video sensor data.

Clustering [Jain 1988; Tan 2005] is used to group similar data instances into clusters. Clustering is primarily an unsupervised technique though semisupervised clustering [Basu 2004] has also been explored lately. Even though clustering and anomaly detection appear to be fundamentally different from each other, several clustering-based anomaly detection techniques have been developed. Clustering-based anomaly detection techniques can be grouped into three categories. The first category of clustering-based techniques relies on the following assumption: normal data instances belong to a cluster in the data, while anomalies do not belong to any cluster. Techniques based on this assumption apply a known clustering-based algorithm to the data set and declare any data instance that does not belong to any cluster as anomalous. Several clustering algorithms that do not force every data instance to belong to a cluster, such as DBSCAN [Ester 1996], ROCK [Guha 2000], and SNN clustering [Ertӧz 2003] can be used. The FindOut algorithm [Yu 2002] is an extension of the WaveCluster algorithm [Sheikholeslami 1998] in which the detected clusters are removed from the data and the residual instances are declared as anomalies. A disadvantage of such techniques is that they are not optimized to find anomalies, since the main aim of the underlying clustering algorithm is to find clusters. The second category of clustering-based techniques relies on the following assumption: normal data instances lie close to their closest cluster centroid, while anomalies are far away from their closest cluster centroid. Techniques based on this assumption consist of two steps. In the first step, the data is clustered using a clustering algorithm. In the second step, for each data instance, its distance to its closest cluster centroid is calculated as its anomaly score. A number of anomaly detection techniques that follow this two-step approach have been proposed using different clustering algorithms. [Smith 2002] studied Self- Organizing Maps (SOM), K-means Clustering, and Expectation Maximization (EM) to cluster training data and then use the clusters to classify test data. Note that if the anomalies in the data form clusters by themselves, these techniques will not be able to detect such anomalies. To address this issue, a third category of clustering-based techniques has been proposed, which relies on the following assumption: normal data instances belong to large and dense clusters, while anomalies either belong to small or sparse clusters. Techniques based on this assumption declare instances belonging to clusters whose size and/or density is below a threshold, as anomalous. Several variations of the third category of techniques have been proposed [Pires 2005, Otey 2003, Eskin 2002, Mahoney 2003, Jiang 2001, He 2003]. Several clustering-based techniques have been proposed to improve the efficiency of



these existing techniques, such as [Eskin 2002, Portnoy 2001, Mahoney 2003, He 2003].

3.3.4 Statistical Anomaly Detection The underlying principle of any statistical anomaly detection technique is: "An anomaly is an observation which is suspected of being partially or wholly irrelevant because it is not generated by the stochastic model assumed" [Anscombe 1960]. Statistical anomaly detection techniques are based on the following key assumption: normal data instances occur in high probability regions of a stochastic model, while anomalies occur in the low probability regions of the stochastic model. Statistical techniques fit a statistical model (usually for normal behavior) to the given data and then apply a statistical inference test to determine if an unseen instance belongs to this model or not. Instances that have a low probability of being generated from the learned model, based on the applied test statistic, are declared as anomalies. Both parametric as well as nonparametric techniques have been applied to fit a statistical model. While parametric techniques assume the knowledge of the underlying distribution and estimate the parameters from the given data [Eskin 2000], nonparametric techniques do not generally assume knowledge of the underlying distribution [Desforges 1998]. Parametric Techniques As mentioned before, parametric techniques assume that the normal data is generated by a parametric distribution with parameters Θ and probability density function f(χ, Θ), where χ is an observation. The anomaly score of a test instance (or observation) χ is the inverse of the probability density function, f(χ, Θ). The parameters Θ are estimated from the given data. Alternatively, a statistical hypothesis test (also referred to as discordancy test in statistical outlier detection literature [Barnett 1994]) may be used. The null hypothesis (H0) for such tests is that the data instance x has been generated using the estimated distribution (with parameters Θ). If the statistical test rejects H0

• Gaussian Model-Based. Such techniques assume that the data is generated from a Gaussian distribution. The parameters are estimated using Maximum Likelihood Estimates (MLE). The distance of a data instance to the estimated mean is the anomaly score for that instance. A threshold is applied to the anomaly scores to determine the anomalies. Different techniques in this category calculate the distance to the mean and the threshold in different ways.

, χ is declared to be anomaly. A statistical hypothesis test is associated with a test statistic, which can be used to obtain a probabilistic anomaly score for the data instance χ Based on the type of distribution assumed, parametric techniques can be further categorized as follows:

• Regression Model-Based. Anomaly detection using regression has been extensively investigated for time-series data [Abraham 1989, Abraham 1979, Fox 1972]. The basic regression model-based anomaly detection technique consists of two steps. In the first step, a regression model is fitted on the data. In the second step, for each test instance, the residual for the test instance is used to determine the anomaly score. The residual is the part of the instance, which is not explained by the regression model. The magnitude of the residual can be used as the anomaly score for the test instance, though statistical tests have been



proposed to determine anomalies with certain confidence [Anscombe 1960, Beckman 1983, Hawkins 1980, Torr 1993]. Presence of anomalies in the training data can influence the regression parameters and hence the regression model might not produce accurate results. A popular technique to handle such anomalies while fitting regression models is called robust regression [Rousseeuw 1987]: estimation of regression parameters while accommodating anomalies.

Nonparametric Techniques The anomaly detection techniques in this category use nonparametric statistical models, such that the model structure is not defined a priori, but is instead determined from given data. Such techniques typically make fewer assumptions regarding the data, such as smoothness of density, when compared to parametric techniques.

• Histogram-Based. The simplest nonparametric statistical technique is to use histograms to maintain a profile of the normal data. Such techniques are also referred to as frequency-based or counting-based. A basic histogram-based anomaly detection technique for univariate data consists of two steps. The first step involves building a histogram based on the different values taken by that feature in the training data. In the second step, the technique checks if a test instance falls in any one of the bins of the histogram. If it does, the test instance is normal, otherwise it is anomalous. A variant of the basic histogram-based technique is to assign an anomaly score to each test instance based on the height (frequency) of the bin in which it falls. The size of the bin used when building the histogram is key for anomaly detection. If the bins are small, many normal test instances will fall in empty or rare bins, resulting in a high false alarm rate. If the bins are large, many anomalous test instances will fall in frequent bins, resulting in a high false negative rate. Thus a key challenge for histogram-based techniques is to determine an optimal size of the bins to construct the histogram that maintains a low false alarm rate and a low false negative rate. For multivariate data, a basic technique is to construct attribute-wise histograms. During testing, for each test instance, the anomaly score for each attribute value of the test instance is calculated as the height of the bin that contains the attribute value. The per-attribute anomaly scores are aggregated to obtain an overall anomaly score for the test instance.

• Kernel Function-Based. A non-parametric technique for probability density estimation is parzen windows estimation [Parzen 1962]. This involves using kernel functions to approximate the actual density. Anomaly detection techniques based on kernel functions are similar to the parametric methods described earlier. The only difference is the density estimation technique used. [Desforges 1998] proposed a semi-supervised statistical technique to detect anomalies, which uses kernel functions to estimate the probability distribution function (pdf) for the normal instances. A new instance, which lies in the low probability area of this pdf is declared to be anomalous.



3.3.5 Information Theoretic Anomaly Detection Information theoretic techniques analyze the information content of a data set using different information theoretic measures such as Kolomogorov Complexity, entropy, relative entropy, and so on. Such techniques are based on the following key assumption: anomalies in data induce irregularities in the information content of the data set. Let C(D) denote the complexity of a given data set, D. A basic information theoretic technique can be described as follows. Given a data set D, find the minimal subset of instances, I , such that C(D) − C(D − I ) is maximum. All instances in the subset thus obtained, are deemed as anomalous. The problem addressed by this basic technique is to find a Pareto-optimal solution, which does not have a single optimum, since there are two different objectives that need to be optimized. In this technique, the complexity of a data set (C) can be measured in different ways; in particular Kolomogorov complexity [Li 1993] has been used by several techniques [Arning 1996; Keogh 2004]. Other information theoretic measures such as entropy, relative uncertainty, and so on, have also been used to measure the complexity of a categorical data set [Lee 2001, He 2005, He 2006, Ando 2007]. This basic technique involves dual optimization to minimize the subset size while maximizing the reduction in the complexity of the data set. Thus an exhaustive approach in which every possible subset of the data set is considered would run in exponential time. Several techniques have been proposed that perform approximate search for the most anomalous subset [He 2006, He 2005, Ando 2007]. Information theoretic techniques have also been used in data sets in which data instances are naturally ordered, for example, sequential data and spatial data. In such cases, the data is broken into substructures (segments for sequences, subgraphs for graphs, etc.), and the anomaly detection technique finds the substructure, I , such that C(D) − C(D − I ) is maximum. A key challenge of such techniques is to find the optimal size of the substructure that would result in detecting anomalies.

3.4 Evaluation methods for Intrusion detection in LCCIs An Intrusion Detection System (IDS) is a software or a device used to monitor computer activities and/or network traffic in order to detect malicious activities, such as unauthorized logins, access to sensitive data or attacks. There are different features which distinguish Intrusion Detection Systems, the most important are the monitored target and the detection model. Considering the target, an IDS monitoring the host on which is installed can be defined as Host IDS (HIDS). It has a detailed view of the system activities of the host, of its processes and of its network traffic, but it consumes computational resources and it is unable to correlate coordinated attacks to multiple hosts. A network IDS (NIDS) is unable to detect malicious processes running on the host, but it analyzes all network traffic monitored by its sensors in order to detect attacks passing through the network. Nowadays, NIDSs are more popular than HIDSs. If we consider detection techniques, we can classify an IDS as a signature-based or an anomaly-based IDS. Signature or misuse detection is a technique for intrusion detection that relies on a predefined set of attack signatures. By looking for specific attack patterns that are



known to appear in the network traffic, the signature-based IDS match incoming packets to the signatures of known attacks. The principles of anomaly detection have been described in Section 3.3.

3.4.1 Evaluation Methods Over the past years, many evaluation methods for Intrusion Detection Systems have been suggested by researchers. These methodologies have been developed in the context of computer networks but background ideas are still valuable in LCCI domains. This is the reason, together with the lack in literature of specific evaluation methodologies for Intrusion Detection Systems in LCCIs, that why current evaluation methods are adaptation of pre-existing techniques. In order to understand what may be considered an effective methodology, we review the foremost concepts and the main techniques used in carrying out an evaluation. Two main techniques may be applied when carrying testing in general: black-box testing and white-box testing. In the work of Gadelrab and El Kalam Abou [Gadelrab 2006] the author states that there are two main methods in evaluating IDSs, both of which follow either the principles of black-box testing or white-box testing. The authors refer to the principles as evaluation by test (black-box testing) and analytic evaluation (white-box testing). An example of analytical evaluation is given in the work of Alessandri [Alessandri 2004]. The basis of this approach works by comparing the IDSs design against certain classes of attack, in order to predict whether or not the attack will be effective. Even if this is a viable approach, it is more suited to testing IDSs under development since this approach will provide for a far more solid design of an IDS which is still under development. The other method, evaluation by testing, can be considered a black-box testing technique. The general principle behind this method is that the IDS will be tested against various attacks, preferably in conjunction with background traffic [Gadelrab 2006]. After testing is completed, the IDS will be evaluated against certain metrics which are defined by the designer of the test. Evaluation by testing (black-box testing) provides for a much more solid basis in terms of evaluation results, since metrics must be defined and accounted for.

3.4.2 Evaluation Metrics Evaluation by testing highlights the importance of choosing correct metrics, in order to ensure that they are relevant and they provide a meaningful purpose for the testing which will be carried out on an IDS. In the work of Sommers et al. [Sommers 2005], the authors evaluated IDSs using two main metrics: efficiency and effectiveness. Efficiency is a measure of false-positives (alerts raised when there are no attacks ) whilst effectiveness is a measure of false-negatives (alerts not raised when there are attacks ). The authors also monitor CPU utilization, memory usage and packet loss to achieve knowledge about performance. This is somehow similar to the slightly more recent piece of work by Gadelrab and El Kalam Abou [Gadelrab 2006] , where the authors attempt to split up evaluation metrics



into two categories: detection related metrics, and resource utilization metrics. Detection related metrics are used to assess how well particular components of a IDS function, whilst resource utilization relates to what system impact the IDS will have whilst running – in other words, performance metrics. Two important metrics for IDSs in LCCIs are also timeliness and availability [Zhu 2008]. Timeliness is particularly important in light of the fact that SCADA systems are hard real-time systems, while availability is related to the 24 × 7 operational requirement of SCADA systems.

3.4.3 Evaluation Dataset As mentioned before, the general principle behind the black-box testing is that the IDS will be tested against various attacks, preferably in conjunction with background traffic. This require a standard dataset containing different kinds of attacks and realized starting from a real network or an ad-hoc testbed. One of the first well known evaluation comes from MIT Lincoln Labs, sponsored by the Defence Advanced Research Projects Agency (DARPA), known commonly as the DARPA evaluation. This work is described by Lippmann, Haines, Fried, Korba and Das [Lippmann 2000]. The main goal in the DARPA evaluation was to carry out a non-biased measurement on the performance of various anomaly-based IDSs, along with producing an evaluation dataset which could be used by others in testing their IDSs [McHugh 2000]. In order to carry out these objectives, Lincoln Lab set up a test network and, through the use of programs and scripts to emulate a large number of workstations, created synthetic background traffic mixed with attack traffic at certain periods of time [Brugger 2007]. The traffic is then captured with a packet capture tool and saved as a data set. This data set could then be played back against the IDS under evaluation to access its performance. Both inside and outside traffic was captured for this experiment. The term “inside” refers to network traffic which would be seen within a local area network (LAN), whilst "outside" would be any traffic outside of this LAN . While this evaluation represents a significant and monumental undertaking, there is a big problem: the dataset does not represent the characteristics of a SCADA network nor those of a LCCI. This is because both attack and background traffic lack realism in comparison with real-life SCADA network traffic . At present there isn't a dataset for this kind of systems and also considering their increasing exposure to attacks it is difficult to foresee if in the next years a specific dataset will be released.

3.4.4 Pentesting tools There are existing tools and programs which can be used for carrying out evaluation of IDSs. These tools and programs generally focus on testing of NIDSs. Examples of such works include Nidsbench created by Anzen Computing (1999), IDSwakeup created by Aubert (2002) and the Metasploit framework created by Metasploit Project (2009). It must be noted that, although these tools and programs may be used to evaluate an IDS, the actual process of the evaluation along with the metrics of evaluation is entirely up to the end user. In other words, they do not represent a comprehensive evaluation methodology of any kind. However, at the same time, it is still worth providing an analysis in order to gain an understanding of some techniques which may be employed



to test IDSs. Metasploit The Metasploit Framework is both a penetration testing system and a development platform for creating security tools and exploits. The framework is used by network security professionals to perform penetration tests, system administrators to verify patch installations, product vendors to perform regression testing, and security researchers world-wide. The framework is open source and consists of tools, libraries, modules, and user interfaces. The basic function of the framework is a module launcher, allowing the user to configure an exploit module and launch it at a target system. If the exploit succeeds, the payload is executed on the target and the user is provided with a shell to interact with the payload. At present hundreds of exploits and dozens of payload options are available and from few years there are also SCADA specific exploits. This mean that it is possible to test a SCADA network or a LCCI against some specific attacks and at the same time to verify if the deployed IDSs are able to detect the threat.

3.4.5 Existing Solutions for LCCI IDSs Non-malicious activities raise false alarms are still the main open issue affecting anomaly detection IDSs. Nevertheless, most existing proposals related to LCCIs are based on anomaly related approaches. For example, Cheung et al. [Cheung 2006] claim that process control networks tend to have static topologies, regular traffic patterns, and a limited number of applications and protocols running on them. Backed-up by this argument, this research group adapted the specification-based approach for intrusion detection to SCADA systems that rely on ModbusTCP, the most widely used application layer protocol for communications between control stations and field devices in industrial networks. This work integrates a multi-algorithm IDS appliance containing pattern anomaly recognition, Bayes analysis of TCP headers, and stateful protocol monitoring complemented into customized Snort rules [Snort]. Alerts are forwarded to the correlation framework. They offer three model-based techniques to characterize the expected acceptable system behavior according to the Modbus/TCP specification and to detect potential attacks that violate these models. The first technique, at the protocol-level, is based on building the specifications for individual fields and for groups of dependent fields in the Modbus/TCP requests and responses. The second technique, considering communication patterns, is based on the analysis of the communications among the network components. The detection of violation of the expected communication patterns is done through SNORT rules. The third technique, considering the service usage patterns, is based on learning models that describe the expected trends in the availability of servers and services. Another paper [Valdes 2009a] proposes two anomaly detection methods based on adaptive learning, in particular a pattern-based anomaly detection and a flow-based anomaly detection. The first method examines patterns in a stream of observations through a version of competitive learning. A pattern is a vector of feature values relevant to a particular implementation; in particular the authors consider patterns



consisting of source and destination IP addresses and destination port. Patterns are evaluated against an initially empty pattern library. If a pattern matches an existing pattern, then the best-matching library pattern "wins". If the match is not exact, the winning library pattern is slightly adapted in the direction of the new pattern. An attractive characteristic of this method is that it can adaptively learn multiple patterns of normal and abnormal activity. In particular, the system does not require attack-free training data. In the second method, a database of active and historical flow records observed in the PCS is maintained. A flow record is generated or incremented as packets are observed. As flow records are "touched" by packet traffic, they are evaluated against learned historical norms. In addition, there is a periodic global update where the flow records since the last global update is folded into historical statistical profiles. Main targets are anomalies such as observation of a new flow, significant changes in the rate of a flow, and absence of an expected flow. The algorithm is presented as a detector of anomalous flows, but the authors claim that it can be easily adapted to specific flows (for example, specific to particular MODBUS function codes). The same authors propose in [Valdes 2009b] a multilayer security architecture that addresses the challenges of PCS monitoring, providing timely and accurate reporting of security-relevant events. They uses different tools both commercial, such as Snort and ArcSight’s Security Information Event Management (SIEM) framework [SIEM], and self developed, such as WholeNet viewer [Valdes 2006] that provides multiple user-customizable views and animation to visualize network traffic data, and two intrusion detection sensors, namely EMERALD Bayes sensor [Valdes 2000] and EModbus. An other multi-layer approach is proposed by Roosta et al. [Roosta 2008]. The authors present a design of a distributed, multi layer, model-based intrusion detection system for wireless process control systems. They focus on WirelessHART [WirelessHART], an open wireless communication standard designed to address the critical needs of the process industry for reliable, robust, and secure wireless communication. It uses a wireless mesh network, meaning that all the field devices can perform routing. The proposed IDS architecture consists of two components: a central IDS, and multiple "field IDS" distributed in the field among the sensor nodes. The main IDS resides on the network manager and is responsible for monitoring all the packets that arrive from the sensors and are destined for the masters. The field IDSs are deployed using “supernodes” which are sensor nodes with higher communication and computation power and have tamper-resistant hardware. The field IDSs are responsible for passively monitoring the communication of the sensor nodes in their neighborhoods to collect trace data. They periodically send monitoring messages to the central IDS, where the messages contain information on the traffic patterns of sensor nodes in their vicinity. The authors propose an IDS that defines normal behavior at different layers of the network stack. In particular, at physical layer the IDS looks for anomalies in transmission power, at datalink layer it detects collisions at the MAC layer caused by transmissions at a wrong frequency or in a wrong time slot. At a higher layer, it monitors several possible anomalies: correctness of packet routing through the mesh network; traffic load, which is concerned with the amount of information being passed in the network; traffic pattern, that refers to the communication pattern among the nodes in the network; and communication datagrams, which need to respect WirelessHART specifications.



Düssel et al. [Düssel 2010] propose a payload-based anomaly detection that uses n-grams and works in four step. Inbound transport layer packets are captured from the network by Bro [Paxson 1998] and TCP Payload is extracted and forwarded to the feature extraction stage. Here, byte sequences are mapped into a feature space, which is defined by the set of sequential features extracted from incoming sequences. Vectorial data structures are used in order to allow operating in high-dimensional feature spaces. Then, the similarity between byte sequences is determined by computing the pairwise distance between their respective vectorial representations. The anomaly detector initially learns a global model of "normality" which is represented by the center of mass of training data points. At detection time, arriving byte sequences are compared to the previously learned model and based on a distance an anomaly score is calculated. The n-gram approach can be found also in the work of Bigham et al. [Bigham 2003]. The authors focus on data that is passed around an electricity networks in order to learn an n-gram model and an invariant model. While the n-gram approach is used for the first four bytes of each data reading to determine a model of sign, decimal point position and most significant digits, the latter model is used to determine linear dependencies between different data readings, which are expressed as invariants. In a successive work, Jin et al. [Jin 2006] specifically address anomaly detection in electricity infrastructures. The authors extended the set of invariant models by a value range model, which marks a data reading to be anomalous if its value exceeds a pre-determined threshold. Furthermore, a bus-zero-sum model is deployed which tests current inflow and outflow on a bus for equality. Anomaly scores are finally combined in a probabilistic framework to reason about the likelihood of an anomaly given the set of trained models. Clearly, the features used for anomaly detection strongly depend on a specific domain. In another work concerning electricity infrastructures, Rrushi and Campbell [Rrushi 2008] looked into the attacks on the implementations of IEC 61850 [IEC61850], the protocol used for communication between electricity substation and power plant. As a specific example, the authors consider a nuclear power plant. The authors build a probabilistic profile of legitimate data flows along with the main characteristics of the substation information exchanged between Intelligent Electronic Devices and communication services in IEC61850 that is invoked in an electrical substation interfacing with a power plant. For each logical node of IEC 61850, they apply Bayesian Belief Networks (BBN) to enumerate the probability distributions attributed by its associated legitimate data and potential attack data, respectively. Then they use the Möbius tool to build the Stochastic Activity Network models to verify above bindings and to derive detection rules to evidence intrusions. Besides the simulated sensor data and nuclear power plant, the authors also simulate a distributed control system through a host-based network of virtual machines running FreeModbus [FreeMODBUS] that is a free implementation of the Modbus protocol on an uClinux operating system [uClinux]. The authors remark that their intrusion detection rules are implementable in electrical substations, and all construction of attack-effects are based on known failure models. Hence, their capability to deal with novel attacks is limited. An example of signature-based IDS can be found in the paper of Oman and Phillips [Oman 2007]. The authors give a clear description of the implementation of a SCADA power-grid testbed for intrusion detection and event monitoring. Their work produces



comprehensive intrusion signatures for unauthorized access to SCADA devices besides baseline-setting files for those devices. Details about each SCADA device in the testbed such as its IP address, telnet port, and legal commands for the device are expressed in XML. A Perl program parses the XML profile and creates Snort signatures [Snort] for legal commands on the RTU to monitor normal operations. For complex events whose signatures cannot be automatically generated through the above mechanism, extra steps are taken to produce their customized signatures. For example, failed password attempts, require pattern matching on the RTU’s failed response to a bad login attempt. A packet sniffer is used to determine the response, and a customized signature is created to detect login failures before they are graphed. On the other hand, the system maintains a single settings repository which contains one or more baseline setting files for each device to monitor setting changes made either at the local terminal or over the network. This work has also other contributions: it protects the baseline data from unauthorized access and modification; it allows revision control that enables device settings to be compared over time; it uses a PerlExpect script that runs every five minutes to log onto the devices and to verify whether the issued command succeeds. The automated gathering and comparison of device settings over time is very useful to SCADA operators, who nowadays rely on personal notes and reminders about device settings. The current version of the prototype automates intrusion detection and settings retrieval for RTUs only. Special attention needs to be paid to the security of their revision control and uptime monitoring/polling, which may be subject to vulnerability on its own and to a vector for Denial of Service (DOS) attacks.

3.5 Field Failure Data Analysis Field Failure Data Analysis (FFDA) is a well-known methodology for the dependability evaluation of computer systems. It consists in the analysis of spontaneous occurrences of failures in a system during its operational phase, without forcing or inducing artificial failures. Event logs represent one of the most adopted sources of failure data. Logs are a collection of files where applications and system modules register their normal and anomalous activity, in the form of events. For this reason, ``logs are the first place where system administrators go when alerted to a problem, since they are one of the few mechanisms for gaining visibility of the behavior of the system’’ [Oliner 2007]. FFDA results allow understanding the effect of faults on the behavior of the system under study, and they provide valuable information for the elaboration and validation of analytical models. Qualitative and quantitative analysis of the failure, error and fault types observed in the field yields feedback to the development process and contributes to improving the production process [Chillarege 1993]. FFDA has shown its benefits over a wide range of systems during the last three decades. A non-exhaustive list includes operating systems [Simache 2005] and [Kalyanakrishnam 1999b], control systems and mobile devices [Laplace 1999] and [Cinque 2006], supercomputers [Oliner 2007] and [Liang 2006], and large-scale applications [Sahoo 2004], [Oppenheimer 2003], [Schroeder 2006], and [Lim 2008]. These studies made it possible to improve successive generations of systems [Murphy 2000] and contributed to gain a significant understanding on the failure modes of computer systems, driving fundamental research. Seminal contributions in the 80s uncovered the relationship between workload and failures [Castillo 1980] and [Iyer



1982], suggesting accounting the impact of system load in dependability models. Other studies in the 90s revealed a progressive shift in the failure cause, from hardware to software faults [Gray 1990], thus motivating more research on software fault tolerance. More recently, FFDA has contributed to perform fault trend analysis and failure prediction useful to define novel mitigation actions in complex large-scale systems [Liang 2006] and [Fu 2007]. Performing a FFDA of a computer system means to collect, to manipulate, and to analyze failure data. Data collection defines the methods and the techniques to gather failure data during operations. Collected data are manipulated in order to remove useless and/or redundant information. Finally, analysis is performed to derive quantitative results and measures. Examples are the classification of failure types and locations, the identification of dependability bottlenecks, and the estimation of Time To Failure (TTF) and Time To Recover (TTR) statistical distributions. TTF and TTR allow to measure reliability, availability, and maintainability, as well as to build dependability models. Analyzed data mainly consists of log files available in the system under study. Log files are conceived as human-readable text files for developers and system administrators to gain visibility in the system behavior, and to take actions in the face of failures. Through a simple programming interface, applications write events (i.e., lines of text in the log) according to developer's needs. An event typically contains a time-stamp and a description, along with the application or system module that reported the event. From the logs it is thus possible to extract useful information on the failures, which occurred in the system. Well-known examples of event logging systems are the UNIX syslog [Lonvick 2001] and Microsoft's event logger. The former is widely adopted on unix-based systems. It provides a programming interface and a service, i.e., syslogd, to store events in either local or remote human-readable text log files. The latter share the same principles as the former: the logging system is implemented as a service on Windows operating systems to collect local or remote events produced by applications. In this case, events are stored in a binary file, readable by external tools. Data manipulation is usually done manually, by means of ad-hoc algorithms and techniques to eliminate not useful data (e.g., housekeeping events [Hansen 1992], reporting non-error conditions) to disambiguate events, and to coalesce correlated events. In particular, with respect to dependability evaluation, significant efforts are needed to identify system reboots and failure occurrences, which can be used to estimate, for example, the system availability or the statistical distribution of the TTF. A commonly used approach to identify a reboot signal from the log is to locate specific event patterns (e.g., [Simache 2005]). On the other hand, the identification of the entries related to the same failure occurrence, is more challenging. This task usually requires a preliminary log inspection (e.g., to figure out log events severity and error-specific keywords within the logged text) as well as procedures to cluster a related set of alerts to a single alert per failure [Oliner 2007]. An example is represented by the tupling coalescence scheme [Hansen 1992], which uses the heuristic of the tuple, i.e., a collection of events, which are close in time. The heuristic is based on the observation that error events, if due to the same underlying fault, are likely to be reported close in time. As the fault propagates through



the system, several hardware and software detectors can be triggered, resulting in multiple events being reported in the log. Tools The increasing utilization of FFDA for the dependability analysis encouraged the realization of software packages integrating a wide range of the state-of-the-art FFDA techniques. These packages and tools aim at easing, if not automating, the data collection, manipulation, and analysis tasks. An example is MEASURE+ [Tang 1993], which generates appropriate dependability models and measures (including Markov and semi-Markov models, k-out-of-n availability models, failure distribution and hazard functions, and correlation parameters) based on failure data collected from real systems and converted in a specific format. MEADEP [Tang 1998] is a more advanced tool which consists of 4 software modules: a data preprocessor for converting data in various formats to the MEADEP format, a data analyzer for graphical data-presentation and parameter estimation, a graphical modeling interface for building block diagrams (including the exponential block, Weibull block, and k-out-of-n block) and Markov reward chains, and a model-solution module for availability/reliability calculations with graphical parametric analysis. Analyze NOW [Thakur 1996] is a set of tools tailored for the FFDA of networks of workstations. It embodies tools for the automated data collection from all the workstations, and tools for automating the data analysis task. In [Vaarandi 2002] and [Rouillard 2004] a tool for on-line log analysis is presented. It defines a set of rules to model and correlate log events at runtime, leading to a faster recognition of problems. The definition of rules, however, relies on the log contents and analysts' skills. Limitations of log files Literature in the area of FFDA shows that event logs provide useful and detailed insight into the dependability behavior of real-world systems. Nevertheless, several works also recognize the inadequacy of event logs to perform dependability evaluation. A study on Unix workstations and servers [Simache 2005] recognizes that logs may be incomplete or ambiguous, and it describes an approach for combining different data sources to improve availability estimations. In [Buckley 1995] authors provide evidence that several issues, such as missing events, inconsistent information, and bogus timestamps, can affect logs. They provide recommendations to create better event logs, such as to recognize the presence of different users and to consider the handling of events as a core requirement of the system. In [Kalyanakrishnam 1999b] a study on a networked Windows NT system shows that many reboots, i.e., about 50%, do not show any specific reason, thus enforcing the need for better logging techniques. A study on supercomputers [Oliner 2007] shows that logs may lack useful information for enabling effective failure detection and diagnosis. It also suggests that it would be useful to include operational context information (i.e., the time at which the log was produced such as scheduled downtime, production time, and so on) along with log entries, to better contextualize collected data, and drive proper conclusions. Recent studies (e.g., [Cotroneo 2006] and [Silva 2008]) pointed out the inadequacy of logs to provide evidence of software faults, among the main responsible of system failures and that can



be activated in the field by complex environmental conditions [Gray 1986]. In [Cotroneo 2006] it is showed that, even if the JVM is equipped with a sophisticated exception handling mechanism, built-in error detection mechanisms are not capable of detecting a considerable amount of failures (45.03%). The problem is that software faults may escape any low-level check and remain completely unreported. In general, the coverage of current logging mechanisms with respect to failures due to software faults is about 40% [Cinque 2010]. For example, in C/C++ programs, bad pointer manipulations can originate a process crash before any useful information is logged. An infinite loop caused by bad variable management may lead to a hang, without leaving any trace in the logs. Recent contributions address inefficiency issues of log files. A proposal for a new generation of log files is provided in [Salfner 2004], where recommendations are introduced to improve log expressiveness by enriching their format. A metric is also proposed to measure information entropy of log files, in order to compare different solutions. Another proposal is the IBM Common Event Infrastructure [IBM], introduced mainly to save the time needed for root cause analysis. It offers a consistent, unified set of APIs and infrastructure for the creation, transmission, persistence and distribution of log events, according to a well-defined format. Authors in [Cinque 2009a] propose to enrich traditional logging strategies by defining a set of rules, to be followed at design time, specifically conceived to improve the quality of logged failure data and to ease the coalescence of redundant or equivalent data.

3.6 On line Monitoring With the term "online monitoring" we refer to a series of techniques used to monitor the health of systems in order to i) perform some run-time analysis (e.g., anomaly detection, failure detection) and ii) to measure its dependability attributes, such as, availability, reliability and maintainability, under real work conditions. In order to collect real data from complex and critical systems online monitoring is crucial to provide experimental measurements for quantitative analysis of Large Complex Critical Infrastructures. This activity is key aspect to improve the reliability and maintainability of critical and complex infrastructure during the normal period of activity, because it is propaedeutic to detect fault activation, performance bottleneck and quickly identify the components responsible for the problem. For these reasons, it is fundamental to choose the right tools and metrics in order to satisfy strict requirements of LCCIs. In this section we present some of the most interesting monitoring approaches, techniques and existing tools. The existing monitoring techniques are generally classified into direct and indirect approaches. With direct approaches we try to understand the health of the system by directly querying and receiving data that the component is able to provide (e.g. log, SNMP). In the second case, indirect approaches attempt to infer system state by monitoring its behavior from an external point of view, namely, recording and analyzing the interactions with the environment and with other systems.



Log files represents, probably, the most used sources of information to analyze the systems behavior. Logs are generally designed as a text file interpreted by developers and administrators with the aim of acquiring knowledge about the behavior of the system. This is especially true when dealing with large, complex systems, consisting of heterogeneous software components. In such systems, logs, are often the only source of information on health status of the monitored system. A non-exhaustive list of studies based on logs, is provided in the FFDA section. Unfortunately, several studies have highlighted the inadequacy of the log for the assessment of reliability. Logs are heterogeneous and imprecise [Buckley 1995] and [Simache 2005], and may provide ambiguous information [Kalyanakrishnam 1999]. This is a consequence of the lack of a systematic approach for the production of logs that are currently dependent on skills and competencies of developers [Kalyanakrishnam 1999]. For instance, developers can easily forget to record events of interest regarding the some errors or could record events with ambiguous descriptions. Recent studies address issues related to the heterogeneity of log formats. However, the incompleteness and ambiguity of the log is still unsolved. Therefore, logs are heterogeneous not only in their formats, but also in their content and semantics, since similar situations can be described in different ways by different developers. Crucial decisions on how to produce and collect logs are taken only in the latter stages of the life cycle of the software (e.g., during the development of the code). For these reasons, it is reasonable to state that the current logging systems are not designed to support the assessment of reliability and anomaly detection. Therefore, we need to exploit different kind of data in order to better understand system's behavior, and to define actions that can improve the maintainability of the system as a whole. As demonstrated by recent studies [Carrozza 2010] and [Kiciman 2005], also information obtained indirectly by monitoring the status of system is useful for understanding the possible causes of malfunction. For example in [Agarwala and Schwan, 2006] traces of system calls (system calls) are used to detect performance failures.

Several strategies are proposed in literature for online monitoring of complex distributed systems. A summary of the most representative is presented as follows.

Pinpoint [Kiciman 2005] is based on the request for analysis to obtain "controlpaths and take advantage of the large number of client requests to detect anomalies operating system. Chopstix [Bhatia 2008] operating system stores the events and the StackTrace for identification of faults in applications. Ganglia [Massie 2004] provides for the monitoring of cluster and Grid resources (eg CPU, memory, etc.) in which scalability is the primary requirement because it has to do with a large number of nodes. To meet the requirements of scalability to hundreds or even thousands of nodes, it uses the hierarchical aggregation of data. Comon [Park 2006] monitors the use of the resources of the nodes of PlanetLab, if you need to understand the interactions between these unwanted, remaining largely agnostic applications running on these nodes.



In the work presented in [Agarwala 2006] the authors propose a method for monitor multi-tier applications in enterprise environments. Their technique achieves performance metrics by monitoring system calls, and has proven to be able to identify the bottlenecks of such systems. New techniques for monitoring based on both indirect and direct approaches are proposed in [Carrozza 2008]. With regard to indirect approaches, the idea consists in inferring the behavior of the system by means of to statistical analysis build on data collected through monitors placed at Operating System level. Several technical problems make difficult to monitor LCCIs with such kind of techniques. First, an effective technique often requires a detailed analysis of the use of resources that go beyond simple measures of parameters such as average CPU load, bandwidth network, the number of activities completed [Blueprint 2003]. Secondly, the overhead of the available tools is far from negligible (8-30%), making their use impractical in real scenarios. Third, the great diversity of sub-systems components does not make available simple and common interfaces for the assessment and analysis of system behavior. Finally, the source code of the monitored component is probably not always available for all sub-systems components; therefore more advanced techniques to insert some hook in the executable program have to be considered [Lenglet 2004].

The monitoring techniques presented do not resolve all these issues and have limitations, which do not allow satisfying the stringent requirements of LCCIs and thus are not fully applicable.

3.7 Fault Injection The unfeasibility of obtaining a fault-free computer-based system is currently common knowledge fact and demonstrated frequently by the many failures of systems in the operational scenario, including mission-critical systems and specifically hardened systems (e.g., the Mars Climate Orbiter mission failure).

Considering the unfeasibility in producing fault-free systems, dealing with faults becomes then a very important aspect in development, testing and validation of software systems and components. Developers and researchers need techniques and tools to: understand how a given component deals with internal or external faults; assess the efficacy of fault tolerance mechanisms; to assess risk in the face of faults; and compare systems using fault handling as decision factor. A valuable approach to serve such purposes is represented by fault injection: it consists in the artificial introduction of faults into a system, in order to assess the system behavior in the presence of faults. This strategy is able to provide insights about the effects of unpredictable interactions between different components in the system and, more important, about faults propagation. Faults should be injected into the system according to the faults that are most likely to occur during the development or the operational phases, which the system is expected to tolerate. Therefore, the characterization of faults is a crucial aspect of fault injection: in order to obtain meaningful results, the injected faults should be as similar to real faults are possible. Fault injection techniques can be broadly classified in:



• Hardware Fault Injection: it consists in the injection of faults into the physical components of a system, such as faults affecting an electronic circuit. This kind of fault injection can be further classified in hardware-implemented and software-implemented, which respectively inject the real physical faults (e.g., by interfering with the voltage within a circuit) or the effects of faults to software (e.g., by modifying the state of a program).

• Software Fault Injection: it consists in the injection of software faults (i.e., bugs) into a program, that is, defects that could occur during the design or coding phase. They can be injected by mutating the program code, or by modifying the program state at run-time.

In particular, software faults are recognized as a major cause behind computer-based system failures ([Gray 1990, Lee 1995, Sullivan 1991, Oppenheimer 2003]) as software is increasingly becoming more complex (more than hardware) and represents an ever-increasing portion of the whole hardware-software system. Given the relevance of software faults over other types of faults in computer-based systems, we will address only the works related this type of faults. Objectives of fault injection Fault Injection is a key experimental technique to evaluate (and increase, depending on how it is used) the robustness and behavior in the presence of faults of computer-based systems. Fault injection uses a kind of "what-if" reasoning, answering to the question “what if there is this fault (or a similar one) in this system or component". It consists in reproducing faults within a given target to observe and measure several aspects such as tolerance to faults, failure mode criticality, error propagation, and so on. One fundamental aspect that makes fault injection attractive to researchers is the fact that it makes feasible the observation of the behavior of the system when a fault is activated within a reasonable time. Since the activation of real faults is a rare event (otherwise faults would have been removed), fault injection brings a very useful acceleration factor to the activation of faults. The foremost aspects of fault injection that makes it a valuable tool for researchers and developers are:

• Verification and validation of fault handling mechanisms, acting as a kind of test for the specific parts of the system related to fault tolerance and error recovery. According to this notion, faults would play the same role as inputs in traditional testing. Many works validated systems and components using fault injection (e.g., [Kalakech 2004], [Madeira 2002], [Ng 2001], [Ng 1996], [Ng 1999b], [Tsai 2000], [Avresky 1996], [Ceccarelli 2009], among others).

• Its ability to exercise the system against unforeseen stress conditions to analyse failure modes, assess risk, and to compare systems. By injecting a fault into a system, one is asking a what-if question about the system: “what if there is a dormant fault similar to this one?”. This is in fact a valid question since it is generally accepted that all non-trivial software systems and components have faults. Fault injection is in fact recognized as a valuable technique for system comparison and used in many works (e.g. [Arlat 1990], [Arlat 1993], [Durães 2002], [Durães 2003], [Skarin 2008]).



• Taking a step further than system comparison, several benchmarks use fault injection as the basis to observe and compare systems. Examples are [Vieira 2003], [Durães 2004] for dependability benchmarks of OLTP systems and web servers, and [Brown 2000] for availability benchmarking.

Injection of software faults Software faults are among the less understood classes of faults that affect computer systems. The following aspects make injection of software faults a difficult task: what is exactly a software fault (the fault nature), how to emulate them, how to describe a fault in a suitable manner for injection, and which are the most relevant and representative faults for injection.

Characterization of software faults Described in a simple form, a software fault is a defect existing in the program. This means that the program contains wrong instruction sequences or data. This may have been caused by the programmer (a bug). Cases where the programmer followed a wrong specification (in which case the fault originated in the specification phase or even in the requirement phase) are at the edge of the notion of software fault (specification or requirement fault), are less common and outside the scope of this document. Several work contributed to the knowledge about software faults through extensive field data on faults discovered in the operational phase. The seminal work of [Gray 1985] presents a survey on failures of Tandem fault-tolerant computer systems based on 166 failure reports. Later this work was extended in [Gray 1990] and used a much larger database of field data composed of 515 failure reports (62% of those were in fact software related). [Fenton 2000] presented a quantitative analysis of software faults and showed interesting properties such as: a small number of modules contain most of the faults of the system; a small number of modules contain most of the faults that cause failures; and fault densities at corresponding phases of testing of consecutive releases of the same software system remain roughly the same. In [Durães 2006] a field data based on open source software used several hundred patch releases to characterize and understand the nature of faults and which fault types are the most common. In [Grottke 2010] it is presented a field data based on software for space mission. That work included over five hundred faults and characterized the proportions between Bohrbugs (manifests consistently) and Mandelbugs (not systematically reproducible).

Fault injection/emulation The injection of faults is closely tied to the notion of what exactly is a software fault. Several works followed different approaches: data errors, interface errors, and code change.

• Data errors. This approach consists on injecting errors in the data of the target programs. This is in fact a somewhat indirect form of fault injection, as what is being injected is not the fault itself, only a possible effect of the fault. The representativity of this type of injection is more difficult to assert, as the relationship between fault and possible data corruption must also be established. The following are fault injection tools that use data corruption technique: (FIAT) [Segall 1988], FERRARI [Kanawati 1995], DOCTOR [Han 1995], FTAPE [Tsai



1995], Xception [Carreira 1998], GOOFI [Aidemark 2001], and GOOFI-2 [Skarin 2010].

• Interface errors. This approach is in fact another form of error injection where the error is specifically injected at the interface between modules (e.g., system components, or functional units within a program). This usually translates to parameter corruption in functions and API and is considered a form of robustness testing. The errors injected can take many forms: from simple data corruption to syntactically valid but semantically incorrect information (e.g., [Nassu 2008]). As with data errors, the representativity of the errors injected at the interfaces is not clear and there is some empirical evidence that supports the idea that injecting interface errors and changing the target code produces different effects in the target ([Moraes 2006]). The following fault injection tools use API parameter corruption techniques: [Dingman 1995], BALLISTA [Koopman 2000], [Kropp 1998], RIDDLE [Ghosh 1998] and [Ghosh 1999], MAFALDA [Fabre 2000] and [Fabre 1999], DTS [Tsai 2000], and Jaca [Martins 2002]. Although the relationship between real software faults and the data corruption is not clear or direct, this technique has proven to be a valuable tool for robustness testing.

• Code changes. Changing the code of the target to reproduce the sequence of instructions related to the intended fault is naturally the closest thing to having the fault there in the first place. However, this is not easily achieved as it requires to know exactly where in the target code one might apply such change, and knowing exactly what new instructions should be placed in the target code. Several works followed this notion, although with some limitations: [Ng 1996] uses simple code changes not specifically related to realistic faults. The tools FINE [Kao 1993] and DEFINE [Kao 1995] also use code changes, although the fault model is very simple and its representativity not clear. [Madeira 2000] showed that the Xception tool [Carreira 1998] can be used to inject simple code changes in running processes. The G-SWFIT technique presented in [Durães 2006] enables the location of suitable locations in the target code and then change their instructions to emulate specific faults. The relationship of the changed code to the intended fault type is asserted through a set of operators that are used to both identify suitable locations for fault injection and the modification of its instructions to emulate the intended fault. Although G-SWFIT was proposed for the x86/IA32 platform, the basis of the technique can be ported to any platform.

In this document, we follow the notion that a software fault is a portion of the code that is wrong (simply wrong, missing or surplus). Thus injecting (emulating) the fault requires knowledge about what exactly to change and where within the target code. Fault description for fault injection One important issue when injecting faults using code changes (i.e., following the notion that a fault is wrong code), there is the need to have the faults classified and described in a detailed and precise manner such that an automated tool can reproduce (emulate) the wrong code in the target. Also, this description must not be too minute to avoid having each fault being its own type. The early work of [Perry 1985] offers a



classification of software. Although going in the direction of the notion of a fault is wrong code, it offer a huge set of types, many of them too vague for automated fault injection /emulation. That work was later extended in [Perry 1993] stressing out the useful notion of fault morphology (i.e., “what is”) in terms of code, and separating it from other aspects such as causes (why it was done). [Sullivan 1991] presents an overview of faults collected from the MVS system and offers a classification scheme particularly helpful for fault injection experiments since faults are described in a level of detail very close to the programming level. That work was later extended in [Sullivan 1992] where the notion of defect-type is introduced. This notion points to a high-level classification of faults including function, data, assignment, and interface, offering a classification that related faults with their context at the high level source-code. This notion was latter extended in [Chillarege 1992] which proposed the Orthogonal Defect Classification. Although aimed at providing feedback during development phase, this classification further refined the description of software faults towards its usability in fault injection. The work presented in [Durães 2006] further extended this level of description and offered a classification scheme that fulfilled the requirements of being precise enough for automated fault emulation by reproducing the intended fault in the target. It also provided information on which types of faults are most likely to exist in the operational phase, and it proposed a technique (G-SWFIT) to emulate those faults directly at the executable code (i.e., without requiring the original source code).

Representative faults Many of the scenarios that use fault injection require that the injected faults are representative. Perhaps the foremost example is the dependability benchmark, where the faults must be representative of the faults that really may exist in the class of systems targeted by those benchmarks. In a first approach, representativity translates to knowing which types of faults are more common, thus likely to exist in the target. Several works offered insight in this aspect (e.g., [Sullivan 1992, Chillarege 1992, Durães 2006]). However, fault type ranking alone is not enough, as other aspects are also important, such as size of the module (as first pointed in [Moraes 2006]), location within the module, and so on. This topic is currently an open research field.

Attack Injection An attack is a malicious intentional fault introduced in a system (in a software or computing or communication’s system), with the intent of exploiting a vulnerability (i.e., a voluntary or non-voluntary weakness) in the system. Hence, a vulnerability successfully exploited by an attack, can lead to one or more errors [Powell 2003, Neves 2006]. Different works propose attack Injection solutions. In particular, [Fonseca 2009] address the security of web applications by applying a procedure inspired on the fault injection technique. present a methodology that can be used to test important security mechanisms applied to web applications. The methodology is based on the injection of realistic vulnerabilities and subsequent controlled exploit of the vulnerabilities to attack the system. This provides a practical environment that can be used to test counter measure mechanisms (such as IDS, Web Application Vulnerability Scanners, Firewalls,



etc.), train and evaluate security teams, estimate security measures (like the number of vulnerabilities present in the code), among others. [Kiezun 2009] presents an automatic technique for creating inputs that expose web vulnerabilities. The technique generates sample inputs, symbolically tracks taints through execution (including through database accesses), and mutates the inputs to produce concrete exploits. The technique generates a set of concrete inputs, executes the program under test with each input, and dynamically observes whether data flows from an input to a sensitive sink. If an input reaches a sensitive sink, the technique modifies the input by using a library of attack patterns, in an attempt to pass malicious data through the program.



4. COMBINING DIFFERENT APPROACHES To master the complexity that arises in the modeling and evaluation of large-scale, complex systems, some approaches tried to combine different evaluation techniques exploiting their synergies and complementarities. When modeling complex systems, a common approach is to combine different modeling formalisms, which may be supported by different solution techniques, to obtain a more accurate representation of different parts of the system having different characteristics. Works combining different modeling formalisms are discussed in Section 4.1. It is also well established and widely recognized that modeling and experimentation complement each other, at least at the conceptual level, but the two approaches are not frequently combined in the literature to evaluate real-life systems. The relationships between modeling and experimentation are further discussed in Section 4.2. Other works available in the literature focus on the combination between analytical and simulation approaches, and they are discussed in Section 4.3. Finally, in Sections 4.4 and 4.5 we discuss two composite modeling and evaluation frameworks that exploit the synergies and complementarities of the different evaluation approaches to mitigate the system complexity.

4.1 Works combining different modeling formalisms When modeling large and complex systems, it may be necessary to combine different modeling formalisms, in order to accurately represent the details of different parts of the system, which may expose different characteristics. As the system under study grows in complexity and heterogeneity in fact, a single modeling formalism reveals almost always inadequate. This is especially true for LCCIs, which are usually composed of heterogeneous subsystems, and are subdivided in different physically or logically separated layers. Current research in the area of dependability modeling tends to exploit the best from the different approaches by combining them in some hierarchical way. Multi-formalism allows to adapt the modeling formalism to the nature and level of abstraction of the subsystem to be modeled and provide the modeler with a single cohesive view of the entire system (see [Ipser 1990], [Praehofer 1990] and [Fishwick 1993]). Modularity and compositionality ease modeling and also allow for the reuse of components. Model complexity is tackled by a heterogeneous combination of multi-formalism modeling techniques and related multi-solution analysis. Resorting to a hierarchical approach brings benefits under several aspects, among which: i) facilitating the construction of models; ii) speeding up their solution; iii) favoring scalability; iv) mastering complexity by handling smaller models that hide at one hierarchical level some modeling details of the lower one. Examples of applications of multi-formalisms approaches can be found in the modeling of hybrid systems, i.e., systems that exhibits both continuous and discrete dynamic behavior. Safety analysis, for example, usually require to account for some critical continuous variables that exceed acceptable limits. Thus, even if the property called safety is considered to be an attribute of the dependability, it often requires autonomous and specific modeling techniques. Two main modeling approaches have been recently proposed to deal with hybrid systems, i.e. systems modeled with a multi-formalism approach: Hybrid Automata and Fluid Petri Nets. Fluid Petri Nets (FPN) [Gribaudo 2002] are an extension of standard Petri Nets, where, beyond the places that



contain a discrete number of tokens, a new kind of place is added that contains a continuous quantity (fluid). The fluid flows along fluid arcs according to an instantaneous flow rate. The discrete part of the FPN regulates the flow of the fluid through the continuous part, and the enabling conditions of a transition depend only on the discrete part. Hence, this extension is suitable to be considered for modeling and analyzing hybrid systems. LCCIs are almost always hybrid systems as well: part of their state depends on continuous variables, which are related to the service provided by the infrastructure and change in a continuous fashion; another part describes the operational state of the infrastructure and it is inherently discrete. The authors of [Lu 2002] for example, consider Electric Power Systems as composed of two layers: a “physical layer”, concerning the underlying physics in electric power transmission and distribution, and an “information layer”, concerning system’s control, management and power accounting issues. The state of the physical layer is continuous by nature, since it is driven by physical continuous quantities, e.g., current intensity or voltage. By contrast, the information layer is discrete, and it is thus characterized by a finite number of states (e.g., “working”, “degraded”, “failed”, “overloaded”). In order to take advantage of such differences, the authors combine two different modeling formalisms, both based on Petri Nets: Stochastic Petri Nets (SPN) are used to model the information layer, while the physical layer is modeled using Variable Arc Weighting Petri Nets (VAWPNs), a continuous variant of Petri Nets similar to FPN. A similar multiformalism approach has been adopted in [Beccuti 2009] within the context of the Electrical Power System. In this work the authors present an approach to model and quantify interdependencies between the Electrical Infrastructure (EI) and the Information Infrastructures (II) that implements the EI control and monitoring system. The quantification is achieved through the integration of two models: one that concentrates more on the structure of the power grid and its physical quantities and one that concentrates on the behavior of the control system. The model of the control system (the II) is constructed using the Stochastic Well-formed Nets formalism (SWN) and is more centered around the protocols involved in the scenario. The model of the Elecrical Infrastructure is based on the Stochastic Activity Networks (SAN) formalism, a variant of Stochastic Petri Nets that is able to represent a continuous state-space, through the “Extended Place” primitive. The decoupling between the discrete and the continuous state of the LCCI is just one of the possible decomposition approaches that can be used to combine different modeling formalism. The authors of [Flammini 2009], for example, propose a multiformalism framework for the modeling of interconnected LCCIs, where each infrastructure is decomposed in three vertical layers, each addressing a specific aspect of the infrastructure, namely the failure-modeling layer, the recovery modeling layer, and the operational capacity modeling layer. Additionally, three horizontal layers are used to decompose input and output interfaces to the internal behavior of the infrastructure. The interaction between the models is then provided by composition operators, which can be used to define either intra-infrastructure or inter-infrastructure interactions. In the latter case the operators define interdependencies between infrastructures, through the interface models.



4.2 Relationships between modeling and experimentation The possible interactions between modeling and experimentation for dependability analysis have been depicted in Figure 12 (see [DBench] and [HIDENETS D4.1.2]).

C D E

Analysis

Modeling

Experimentation

Target SystemExperimental Measures& Features

Comprehensive Measures

Workload

Faultload

A

B

Figure 12. Dependability evaluation based on modeling and experimentation

Modeling as a support for experimentation (links A, B, C, and D). The experimentation is guided, at least partially, by modeling. Actually, the constructed dependability model is processed and sensitivity analysis with respect to numerical parameter values is made, in order to identify the most significant parameters of the model (i.e., associated to the most salient features of the target system) that need to be evaluated accurately by experimentation. In this case, modeling helps in selecting the features and measures of interest to be evaluated experimentally as well as the right inputs to be provided for experimentation (e. g., the workload and fault-load to apply). It is worth mentioning that sensitivity analysis also allows identification of those parameters that need to be evaluated based on field data collected during system operation. Experimentation as a support for modeling (links A, B, and E). In this case, not only the measures assessed experimentally are used as parameters in the models, but also the features identified during the experimentation may impact the semantic of the dependability model. Therefore, experimentation supports model validation and refinement. For example, this case includes a) the calibration of the coverage parameters of the initial dependability model and b) the validation and possible refinement of these models. The construction of analysis models on the basis of measurements performed in a running prototype or in a full deployment is a very interesting research area. The most comprehensive method was developed for performance and performability analysis: software performance models of distributed applications are extracted from traces recorded during execution [Israr 2007]. A similar approach is recording error propagation traces induced by fault injection experiments [Arlat 1993] to support the



construction of error propagation models [Chillarege 1989]. Other works (e.g., [Arlat 1990]) derive high-level behavioral models using experimental measurements obtained from fault injection experiments, while in other papers (e.g., [Coccoli 2002] and [Ten 2008]) the values provided from field data are used to setup parameters of analytical models. Other examples of the combined use of field measurement and modeling can be found in [Coccoli 2002] and [Kalyanakrishnam 1999a]. Another very interesting research area is the definition of a generalized approach that can be used to evaluate and compare different systems and components, which still does not exist. In this context, a framework for dependability benchmarking based on modeling and experimentation has been defined within the DBench [DBench] project (see Section 3.1). The goal of benchmarking the dependability of computer systems is to provide generic and reproducible ways for characterizing their behavior in the presence of faults. DBench has developed a set of benchmarks based on experimentation only and one benchmark based on modeling and experimentation for On-Line Transactional systems (see [Buchacker 2003] and [Kanoun 2004] for more details). The two final measures evaluated from the latter benchmark are the stationary system availability and the total cost of failures. The measures are evaluated by combining measures obtained from experimentation on the target system (e.g., the percentages of the various failure modes) and information from outside the benchmark experimentation (e.g., the failure rate, the repair rate and the cost of each failure mode).

4.3 Works combining modeling and simulation In the literature, several attempts have been made to couple modeling and simulation activities, most of them using low-level simulations to provide more accurate specific parameters to be used by higher-level analytic models (e.g., see [Klemm 2001]). In [Bondavalli 2009, Bondavalli 2011] a highway scenario has been modeled, and the impact of user mobility on the QoS of UMTS communication has been evaluated combining a Stochastic Activity Network (SAN) model with a mobility simulator. The model of users’ behavior and of UMTS network has been provided using the SAN formalism, while the mobility of the users within the scenario has been accurately represented by the mobility simulator. A specific SAN submodel was in charge to progressively read the traces produced as output by the simulator, and to synchronize the state of the two models. The integration of the output produced by an ad-hoc mobility simulator into the modeling process itself allowed to capture more complex and detailed mobility dynamics that may heavily affect the analyzed QoS indicators. The work done in [Bondavalli 2011] includes another example of combination between analytical and simulation approaches. The focus is on a specific application, called Distributed Black-Box (DBB, similarly to avionics black-boxes – see [Killijian 2009]), which provides a virtual mechanism to record periodically historical data about the state of participating vehicles and their environment, which can be replayed in the event of an accident. Simulation is first used to characterize the distribution of some connectivity parameters in vehicular communication scenarios. It is shown that under certain assumptions the car-to-car and car-to-infrastructure encounter processes can be described by a Poisson process. These parameters are then incorporated into a Generalized Stochastic Petri Net (GSPN) model to assess the impact of permanent failures on the availability of the data.



A major research line of the European project CRUTIAL [CRUTIAL] focused on the development of a model-based methodology for the dependability and security analysis of the power grid information infrastructures. Within this context, a modeling framework for the analysis of interdependencies in LCCIs has been developed, with a particular focus on electric power systems (see [Chiaradonna 2007, Beccuti 2009, Chiaradonna 2011]). Complex power flow relations should be taken into account for a correct and detailed modeling of the interdependencies between the components of the electric grid. In the CRUTIAL approach the overall model that represents the organization and topology of the power grid is built using the Stochastic Activity Network formalism (SAN), while the effects on the complete power grid of environment variations (e.g., components failures) are modeled through external mathematical functions solving linear optimization problems. A comprehensive discussion on the challenges and viable approaches for the evaluation of critical infrastructures, and more specifically for the electric power systems, can be found in [Chiaradonna 2008].

4.4 A holistic evaluation framework The European Project HIDENETS [HIDENETS 2006] addressed the provisioning of available and resilient distributed applications and mobile services in highly dynamic environments characterized by unreliable communications and components, mostly concerning the field of car-to-car and car-to-infrastructure communications. One of its main achievements was the definition of a holistic evaluation framework (see [HIDENETS D4.1.2] and [Bondavalli 2011]) where the synergies and complementarities of the different evaluation approaches could be fruitfully exploited. In the quantitative assessment of complex systems like LCCIs and those targeted by the HIDENETS project in particular, a single evaluation technique (including analytical modeling, simulation and experimental measurement) is not capable of tackling the whole problem, i.e., the dependability evaluation of end-to-end scenarios. To master complexity, the application of the holistic approach allows defining a “common strategy” using different evaluation techniques applied to the different components and sub-systems, thus exploiting their potential interactions. The idea underlying the holistic approach follows a “divide and conquer” philosophy: the original problem is decomposed into simpler sub-problems that can be solved using appropriate evaluation techniques. Then the solution of the original problem is obtained from the partial solutions of the sub-problems, exploiting their interactions.



Figure 13: Example of possible interactions among the approaches

Some of the possible interactions among different evaluation techniques are the following (see Figure 13):

• Cross validation. A partial solution validates some assumptions introduced to solve another sub-problem, or validates another partial solution (e.g., a simulation model can be used to verify that the duration of an event in an analytical model is exponentially distributed).

• Cross fertilization. A partial solution (or a part of it) obtained by applying a solution technique to a sub-problem is used as input to solve another sub-problem possibly using a different technique (e.g., a critical parameter in an analytical model is obtained using experimental evaluation).

• Problem refinement. A partial solution gives some additional knowledge that leads to a problem refinement (e.g., the architecture of a component changes since it is recognized to be a system bottleneck).

It is clear that the system decomposition is not unique, as we can identify different system decompositions corresponding to different levels of abstraction. The higher the level of detail required capturing the system behavior, the higher is the complexity of the system to be modeled and solved. Therefore the choice of a particular system decomposition is of primary importance, and it is always a trade-off between faithfulness of representation of the real system behavior (with respect to the measures of interest) and capability to solve the corresponding models. Some examples illustrating the combination of different techniques within the holistic framework are presented in [Bondavalli 2011]. An abstraction-based system decomposition has been adopted in this paper, which statically focuses on the various levels of abstractions that can be used to represent a system (user level, application level, architecture level and communication level). Each level captures a specific aspect of the overall system behavior and it “communicates” with the other levels through some well-specified interfaces. Such interfaces mainly define the input they require



from other abstraction levels, as well as their output. The feasibility of the holistic approach for the analysis of a complete end-to-end scenario is first illustrated presenting two examples where mobility simulation is used in combination with stochastic analytical modeling, and then through the development and implementation of an evaluation workflow integrating several tools and model transformation steps.

4.5 A multi-formalism framework for the automated generation of dependability models

The work in [Cinque 2007, Di Martino 2009] proposes a framework for the automated generation of performance and dependability models for the assessment of Wireless Sensor Networks (WSNs). The framework adopts both behavioral and analytical models. It adopts the AVRORA behavioral simulator, a tool for injecting faults in Wireless sensor network nodes [Cinque 2009b], and it uses the Stochastic Activity Networks (SANs) formalism for analytical models. The behavioral simulator is used to specify and configure the target system and to study its fault-free behavior. The interaction between the behavioral simulator and the SAN models is managed by the framework.

Figure 14 – Steps performed by the multi-formalism framework for the assessment of WSNs The framework operates in five steps, shown in Figure 14.



In step 1, the user provides the inputs needed to specify the target system and to configure the experimental scenario. This means to specify: i) the number and type of nodes, ii) the network topology, iii) the workload of nodes (i.e., the user application), iv) the radio communication technology, v) the adopted routing algorithm, and vi) the sensing hardware technology for each node. These inputs are used to setup the behavioral simulator and they are stored as user preferences in the framework. In this step, the user also selects the set of non-functional properties to evaluate. Step 2 concerns the behavioral simulation of the target system and the execution of the fault injection experiments. The objective is to gather realistic values for the parameters needed to populate analytical models. Step 2 concerns the behavioral simulation of the specified system. Model parameters can be static or dynamic. Static parameters are related to aspects that do not change during the simulation of analytical models, such as the position of nodes. Dynamic parameters change over time and their evolution depend on the current configuration of the system (e.g., number of failed nodes, node transmission rate, packet loss rate) and they need to be re-computed upon each change during the simulation of the analytical model (step 4). The behavioral simulator is used to characterize the fault-free behavior of the WSN in terms of energy consumption of nodes, packet loss rates, workload and adopted routing algorithm, with the final aim of providing realistic values to analytical models parameters needed to populate the analytical models. The fault injection is used to study in detail the behavior of the actual software, including the main application and the OS code running on a WSN node, under realistic low-level faults, such as bit flips. Result of the fault injection makes it possible to characterize the software of nodes in terms of sensitivity to specific faults, failure modes as well as in terms of the time needed for a fault to cause the failure of the node, here referred as fault activation latency. In step 3, the Model Generator automatically produces analytical models starting from a predefined library of model templates, which are defined una-tantum by a domain expert and stored in the knowledge base. The number and type of models to be generated depends on user preferences. For instance, N node models are generated for a WSN composed of N nodes. Each node model is then specialized depending on the topology (which specifies the neighbors of each node) and on the hardware platforms (which impacts on nodes failures). Initial values for model parameters are configured starting from the results of the behavioral simulation (e.g., the packet loss rate of each link, and the energy/workload profile of each node), and from a set of pre-defined parameters (e.g., the failure rates of hardware components) provided una-tantum by domain experts and stored in a knowledge base. To this aim, the framework is equipped with a library of parametric model templates, i.e., model skeletons that can be specialized automatically, depending on the specific system to engineer. Model templates are defined una-tantum by a domain expert, and system engineers do not need to be aware of them. Step 4 concerns the simulation of the generated analytical models. To update dynamic parameters, models are programmed to notify changes to the Changes Manager component. To exemplify, let us consider a case for a node X in a WSN, and let us



assume a neighboring node Y to start sending more packets to X. This change (handled by the Changes Manager) results in an increase of energy consumed by node X. As a result, at a given point in time, node X stops working, due to battery exhaustion. The failure is notified to the Changes Manager, which re-computes the routing tree, according to the chosen routing algorithm, avoiding to re-run the behavioral simulation. Finally, in step 5, the required metrics are evaluated and results delivered to the user, allowing to re-target design choices. It is worth noting that users interact with the framework only in steps 1 and 5, where they work within their knowledge domain, i.e., system specification and simulation results. Details on analytical models and parameter computation are encapsulated by the framework and kept hidden from users.



5. CONCLUSIONS The main goal pursued by this deliverable was to review existing modeling and evaluation approaches for LCCI that can be used for the quantitative evaluation of dependability and security metrics. In the presentation of the state of knowledge, we have first discussed the different types of models that are traditionally used to support dependability analysis and evaluation activities, emphasizing in particular the state-based modeling approaches that are well suited to address the challenges explored in the context of LCCI. We have presented a detailed review of modeling approaches aimed at mastering the largeness of state-space models at the construction and the solution levels, the available tools that can support the dependability evaluation activity, as well as the automatic derivation of low-level dependability analysis models from high-level engineering languages. Then the focus moved to experimental measurements techniques. We surveyed works dealing with dependability and security benchmarking, data filtering and data analysis techniques, quantitative evaluation approaches for intrusion detection, as well as fault injection experimental techniques. The last part of the state of knowledge focused on the approaches combining different and complementary modeling and evaluation techniques to address the complexity of the targeted systems. In particular, we considered works coupling different modeling formalisms, combining modeling with simulation or experimental techniques, finally presenting some composite modeling and evaluation frameworks where the synergies and complementarities of the different evaluation approaches can be fruitfully exploited. This document constitutes the starting point for the definition and design of innovative tools and techniques for evaluating LCCIs, which is the core topic that will be addressed within D3.2.



REFERENCES [Abraham 1979] Abraham, B. and Box, G. E. P. 1979. Bayesian analysis of some

outlier problems in time series. Biometrika 66, 2, 229–236. [Abraham 1989] Abraham, B. and Chuang, A. 1989. Outlier detection and time

series modeling. Technometrics 31, 2, 241–248. [Adve 2000] V. S. Adve, R. Bagrodia, J. C. Browne, E. Deelman, A. Dube, E.

Houstis, J. Rice, R. Sakellariou, D. Sundaram-Stukel, P. J. Teller, and M. K. Vernon. Poems: End-to-end performance design of large parallel adaptive computational systems. IEEE Transactions on Software Engineering, Special Section of invited papers from the WOSP '98 Workshop, 26(11):1027–1048, November 2000.

[Agarwala 2006] S. Agarwala and K. Schwan. Sysprof: Online distributed behavior diagnosis through fine -grain system monitoring. In ICDCS 2006. IEEE, July 2006

[Aidemark 2001] J. Aidemark, J. Vinter, P. Folkersson, and J. Karlsson, "GOOFI: Generic Object-Oriented Fault Injection Tool", in Proceedings of the International Conference on Dependable Systems and Networks, DSN-2001, Göteborg, Sweeden, 2001, pp. 71-76.

[Akyildiz 2002] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless sensor networks: a survey,” Computer Networks, vol. 38, no. 4, pp. 393–422, March 2002. [Online]. Available: http://dx.doi.org/10.1016/S1389- 1286(01)00302-4

[Alam 1986] M. Alam and U. M. Al-Saggaf, “Quantitative Reliability Evaluation of Repairable Phased-Mission Systems using Markov Approach,” IEEE Transaction on Reliability, vol. 35, 498-503, 1986.

[Alessandri 2004] Alessandri, D. (2004), Attack-Class-Based Analysis of Intrusion Detection Systems, Master's thesis, University of Newcastle upon Tyne, Newcastle, UK.

[Ammar 1989] H. H. Ammar and S. M. Rezaul Islam, “Time scale decomposition of a class of generalized stochastic Petri net models,” IEEE Transactions on Software Engineering, vol. 15, no. 6, pp. 809-820, 1989.

[Ando 2007] Ando, S. 2007. Clustering needles in a haystack: An information theoretic analysis of minority and outlier detection. In Proceedings of the 7th International Conference on Data Mining. 13–22.

[Andreolini 2008] M. Andreolini and S. Casolari and M. Colajanni “Models and framework for supporting run-time decisions in Web-based systems”, ACM Transaction on the Web, 2008.

[Angiulli 2002] Angiulli, F. and Pizzuti, C. 2002. Fast outlier detection in high dimensional spaces. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge

http://dx.doi.org/10.1016/S1389-�



Discovery. Springer-Verlag, 15–26. [Anscombe 1960] Anscombe, F. J. and Guttman, I. 1960. Rejection of outliers.

Technometrics 2, 2, 123–147. [Arlat 1990] J. Arlat et al. Fault Injection for Dependability Validation: A

Methodology and Some Applications. IEEE Transactions on Software Engineering, 16(2), 1990.

[Arlat 1993] J. Arlat, A. Costes, Y. Crouzet, J. C. Laprie, and D. Powel, "Fault Injection and Dependability Evaluation of Fault Tolerant Systems," in IEEE Transactions on Computers, vol. 42, 1993, pp. 919-923.

[Arning 1996] Arning, A., Agrawal, R., and Raghavan, P. 1996. A linear method for deviation detection in large databases. In Proceedings of the 2nd International Conference of Knowledge Discovery and Data Mining. 164–169.

[Avizienis 2004] A. Avizienis, and J.C. Laprie, and B. Randell, and C. Landwehr, Basic Concepts and Taxonomy of Dependable and Secure Computing. IEEE Transactions on Dependable and Secure Computing, 1(1), 2004.

[Avresky 1996] D. Avresky, J. Arlat, J. C. Laprie, and Y. Crouzet, "Fault Injection for Formal Testing of Fault Tolerance," IEEE Transactions on Reliability, vol. 45, pp. 443-455, 1996.

[Baier 1999] C. Baier, J.-P. Katoen, and H. Hermanns. “Approximate symbolic model checking of continuous time Markov chains”. In Proceedings of CONCUR'99, volume 1664 of LNCS, pages 146—162, 1999.

[Balbo 2001] G. Balbo. Introduction to stochastic petri nets. In Lectures on Formal Methods and Performance Analysis, volume 2090 of Lecture Notes in Computer Science, pages 84–155. Springer Verlag, 2001.

[Ballarini 2000] S. Ballarini, S. Donatelli and G. Franceschinis, “Parametric Stochastic Well-Formed Nets and Compositional Modelling,” in 21st International Conference on Application and Theory of Petri Nets, Aarhus, Denmark, 2000, (Springer Verlag).

[Barbara 2001] Barbara, D., Couto, J., Jajodia, S., and Wu, N. 2001. Detecting novel network intrusions using Bayes estimators. In Proceedings of the 1st SIAM International Conference on Data Mining. BARBARA, D., LI, Y., COUTO, J., LIN, J.-L., AND JAJODIA, S. 2003. Bootstrapping a data mining intrusion

[Barnett 1994] Barnett, V. and Lewis, T. 1994. Outliers in Statistical Data. John Wiley.

[Basu 2004] Basu, S., Bilenko, M., and Mooney, R. J. 2004. A probabilistic framework for semi-supervised clustering. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, 59–68.



[Bause 1998] F. Bause, P. Buchholz, and P. Kemper, “A Toolbox for Functional and Quantitative Analysis of DEDS,” in Computer Performance Evaluation: Modelling Techniques and Tools: Proc. of the 10th Int. Conf., Tools ‘98, Palma de Mallorca, Spain, Sept. 14-18, 1998, Lecture Notes in Computer Science No. 1469 (ed. by R. Puigjaner, N. N. Savino, and B. Serra), Berlin: Springer, 1998, pp. 356-359.

[Beccuti 2009] Beccuti, M., et al., Quantification of dependencies in electrical and information infrastructures: The CRUTIAL approach. 4th International Conference on Critical Infrastructures (CRIS), pp.1-8, 2009.

[Beckman 1983] Beckman, R. J. and Cook, R. D. 1983. Outlier...s. Technometrics 25, 2, 119–149.

[Béounes 1993] C. Béounes et al. Surf-2: A program for dependability evaluation of complex hardware and software systems. In Proc. of the 23rd Int. Symp. on Fault-Tolerant Computing, pages 668–673, Toulouse, France, 1993.

[Bhatia 2008] S. Bhatia, A. Kumar, M. E. Fiuczynski, and L. L. Peterson. Lightweight, highresolution monitoring for troubleshooting production systems. In R. Draves and R. van Renesse, editors, OSDI, pages 103–116. USENIX Association, 2008

[Birkhoff 1965] G. Birkhoff and C. R. Boor, “Piecewise polynomial interpolation and approximation”, Proc. of General Motors Symposium, 1965.

[Bishop 2001] G. Bishop and G. Welch “An introduction to the Kalman filter”, SIGGRAPH, 2001.

[Blueprint 2003] An architectural blueprint for autonomic computing, April 2003. http://www03.ibm.com/autonomic/ blueprint.shtml.

[Bobbio 1986] A. Bobbio and K. Trivedi, “An Aggregation Technique for the Transient Analysis of Stiff Markov Chains,” IEEE Transactions on Computers, vol. C-35, no. 9, pp. 803-814, August 1986.

[Bobbio 1998] A. Bobbio, A. Puliafito, M. Telek and K. S. Trivedi, “Recent Developments in Non-Markovian Stochastic Petri Nets”, Journal of Circuits, Systems, and Computers 8(1): 119-158 (1998)

[Bologna 2003] S. Bologna, C. Balducelli, G. Dipoppa, and G. Vicoli, “Dependability and Survivability of Large Complex Critical Infrastructures”, Computer Safety, Reliability, and Security, Lecture Notes in Computer Science, volume 2788, pg. 342–353, September 2003;

[Bondavalli 2001a] A. Bondavalli, M. Nelli, L. Simoncini and G. Mongardi, “Hierarchical Modelling of Complex Control Systems: Dependability Analysis of a Railway Interlocking,” Journal of Computer Systems Science and Engineering, 16(4), pp. 249-261, 2001.

[Bondavalli 2001b] A. Bondavalli, M. Dal Cin, D. Latella, I. Majzik, A. Pataricza, and G.



Savoia, “Dependability Analysis in the Early Phases of UML Based System Design”. Int. Journal of Computer Systems - Science & Engineering, Vol. 16, No. 5, pp 265-275, CRL Publishing Ltd, September 2001

[Bondavalli 2004] A. Bondavalli, S. Chiaradonna, F. Di Giandomenico, and I. Mura. Dependability modeling and evaluation of multiple-phased systems using DEEM. IEEE Transactions on Reliability, 53(4):509-522, 2004

[Bondavalli 2009] A. Bondavalli, P. Lollini and L. Montecchi. QoS Perceived by Users of Ubiquitous UMTS: Compositional Models and Thorough Analysis. In Journal of Software, Special Issue: Selected Papers of The 6th IFIP Workshop on Software Technologies for Future Embedded and Ubiquitous Systems (SEUS 2008), Volume 4, Issue 7, pp. 675-685, 2009.

[Bondavalli 2011] A. Bondavalli, O. Hamouda, M. Kaâniche, P. Lollini, I. Majzik, H.-P. Schwefel, “The HIDENETS Holistic Approach for the Analysis of Large Critical Mobile Systems”, accepted for IEEE Trans. On Mobile Computing, 2011.

[Boriah 2008] Boriah, S., Chandola, V., and Kumar, V. 2008. Similarity measures for categorical data: A comparative evaluation. In Proceedings of the 8th SIAM International Conference on Data Mining. 243–254.

[Boudali 2007] H. Boudali, P. Crouzen, M. Stoelinga: A Compositional Semantics for Dynamic Fault Trees in Terms of Interactive Markov Chains. ATVA 2007: 441-456.

[Breunig 1999] BREUNIG, M. M., KRIEGEL, H.-P., NG, R. T., AND SANDER, J. 1999. Optics-of: Identifying local outliers. In Proceedings of the 3rd European Conference on Principles of Data Mining and Knowledge Discovery. Springer-Verlag, 262–270.

[Breunig 2000] Breunig,M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. 2000. LOF: Identifying density-based local outliers. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, 93–104.

[Brito 1997] Brito, M. R., Chavez, E. L., Quiroz, A. J., and Yukich, J. E. 1997. Connectivity of the mutual k-nearestneighbor graph in clustering and outlier detection. Statis. Prob. Lett. 35, 1, 33–42.

[Brown 2000] A. Brown and D. Patternson, "Towards Availability Benchmark: A Case Study of Software RAID Systems", in Proceedings of the 2000 USENIX Annual Technical Conference, San Diego, California, USA, 2000, pp. 263-276.

[Brugger 2007] Brugger, S. & Chow, J, An assessment of the DARPA IDS Evaluation Data set using Snort, Technical report, 2007, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley.



[Buchacker 2003] K. Buchacker and O. Tschaeche, “TPC Benchmark-c version 5.2 Dependability Benchmark Extensions”, http://www.faumachine.org/papers/tpcc-depend.pdf, 2003.

[Buchholz 1995] P. Buchholz, “A Notion of Equivalence for Stochastic Petri Nets,” 16th Int. Conf. on Application and Theory of Petri Nets, Torino, Italy, 1995, pp. 161-180.

[Buckley 1995] M. F. Buckley and D. P. Siewiorek. VAX/VMS event monitoring and analysis. In FTCS, pages 414–423, IEEE Computer Society, 1995.

[Carreira 1998] J. Carreira, H. Madeira, and J. G. Silva, "Xception: Software Fault Injection and Monitorintg in Processor Functional Units," in IEEE Transactions on Software Engineering, vol. 24, 1998.

[Carrozza 2008] G. Carrozza, M. Cinque, D. Cotroneo, and R. Natella. Operating System Support to Detect Application Hangs. In International Workshop on Verific ation and Evaluation of Computer and Communication Systems, 2008.

[Carrozza 2010] G. Carrozza, D. Cotroneoa , R. Natella, A. Pecchia, S. Russo, Memory Leak Analysis of Mission-Critical Middleware. Journal of Systems and Software vol. 83, no. 9, 2010.

[Castillo 1980] X. Castillo and D. P. Siewiorek. A Performance-Reliability Model for Computing Systems. Proceedings of the 10th IEEE Symposium on Fault Tolerant Computing (FTCS-10), October 1980.

[Ceccarelli 2009] A Ceccarelli, A. Bondavalli, D. Iovino, “Trustworthy Evaluation of a Safe Driver Machine Interface through Software-Implemented Fault Injection”, IEEE Pacific Rim International Symposium on Dependable Computing – PRDC09, Shanghai, China, 16-18 November 2009.

[Chandola 2008] Chandola, V., Boriah, S., and Kumar, V. 2008. Understanding categorical similarity measures for outlier detection. Tech. rep. 08-008, University of Minnesota.

[Chen 2002] D. Chen, D. Selvamuthu, D. Chen, L. Li, R.R. Some, A.P. Nikora and K. Trivedi, “Reliability and Availability Analysis for the JPL Remote Exploration and Experimentation System,” in Proc. Int. Conf. Dependable Systems and Networks, pp. 33 7-344, June 2002.

[Cheung 2006] S. Cheung, B. Dutertre, M. Fong, A. Valdes, U. Lindqvist, K. Skinner, and M. Park, "Using Model-based Intrusion Detection for SCADA Networks", Science And Technology, 2006, pp. 1-12.

[Chiaradonna 2007] Chiaradonna, S., P. Lollini, and F. Di Giandomenico (2007, june). On a Modeling Framework for the Analysis of Interdependencies in Electric Power Systems. In Dependable Systems and Networks, 2007. DSN ’07. 37th Annual IEEE/IFIP International Conference on, pp. 185–195.



[Chiaradonna 2008] S. Chiaradonna, F. Di Giandomenico, and P. Lollini. Evaluation of Critical Infrastructures: Challenges and Viable Approaches. In Architecting Dependable Systems V, Lecture Notes in Computer Science, Springer Berlin / Heidelberg, Volume 5135/2008, pp. 52-77, 2008.

[Chiaradonna 2011] S. Chiaradonna, F. Di Giandomenico, and P. Lollini. Definition, Implementation and Application of a Model-based Framework for the Analysis of Interdependencies in Electric Power Systems. To appear in International Journal of Critical Infrastructure Protection, Elsevier, April, 2011.

[Chillarege 1992] R. Chillarege et al. Orthogonal Defect Classification—A Concept for In-Process Measurements. IEEE Transactions on Software Engineering, 18(11), 1992.

[Chillarege 1993] R. Chillarege, R.K. Iyer, J.C. Laprie, and J.D. Musa. Field Failures and Reliability in Operation. Proc. of the 4th IEEE International Symposium on Software Reliability Engineering, November 1993.

[Chiola 1995] G. Chiola, G. Franceschinis, R. Gaeta, and M. Ribaudo. GreatSPN 1.7: Graphical Editor and Analyzer for Timed and Stochastic Petri Nets. Performance Evaluation, special issue on Performance Modeling Tools, 24(1&2):47--68, November 1995.

[Choi 1994] H. Choi, V. G. Kulkarni and K. S. Trivedi, “Markov Regenerative Stochastic Petri Nets”, Performance Evaluation, 20(1-3), pages 337-357, 1994.

[Christmansson 1996] J. Christmansson and R. Chillarege. Generation of an Error Set that Emulates Software Faults based on Field Data. In Proc. of Annual Symposium on Fault Tolerant Computing, 1996.

[Ciardo 1994] G. Ciardo, R. German and C. Lindemann, “A characterization of the stochastic process underlying a stochastic Petri net”, IEEE Transactions on Software Engineering, Volume 20, Issue 7, July 1994 Page(s):506–515.

[Ciardo 1996] G. Ciardo and A. S. Miner, “SMART: Simulation and Markovian Analyzer for Reliability and Timing,” in Proc. IEEE Int. Computer Performance and Dependability Symp. (IPDS'96), Urbana-Champaign, IL, USA, Sept. 1996, pp. 60.

[Ciardo 1999] G. Ciardo and A.S. Miner, “Efficient Reachability Set Generation and Storage Using Decision Diagrams,” in Proc. 20th Int. Conf. Application and Theory of Petri Nets, pp. 6-25, 1999.

[Cinque 2006] M. Cinque, D. Cotroneo, and S. Russo. Collecting and analyzing failure data of bluetooth personal area networks. In Proceedings 2006 International Conference on Dependable Systems and Networks (DSN 2006), pages 313–322, Philadelphia, Pennsylvania, USA, June 2006.

[Cinque 2007] M. Cinque, D. Cotroneo, C. Di Martino, S. Russo. Modeling and



Assessing the Dependability of Wireless Sensor Networks. In Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems (SRDS’07) Bei- jing, China, ottobre 2007, pp. 33-42

[Cinque 2009a] M. Cinque, D. Cotroneo, and A. Pecchia. A logging approach for effective dependability evaluation of complex systems. In Proceedings of the 2009 Second International Conference on Dependability, Athens/Glyfada, Greece, 2009.

[Cinque 2009b] M. Cinque, D. Cotroneo, C. Di Martino, A. Testa, S. Russo. AVR-INJECT: a Tool for Injecting Faults in Wireless Sensor Networks. Proceedings of the 23rd IEEE International Parallel & Distrubuted Processing Symposium (IPDPS ’09). May 29, 2009, Rome, Italy, pp. 1-6. ISBN: 978-1-4244-3750- 4.

[Cinque 2010] M. Cinque, D.Cotrone, R. Natella, A. Pecchia. Assessing and Improving the Effectiveness of Logs for the Analysis of Software Faults. In International Conference on Dependable Systems and Networks (DSN 2010), Chicago, IL, June 2010.

[Coccoli 2002] A. Coccoli, P. Urbán and A. Bondavalli. “Performance Analysis of a Consensus Algorithm Combining Stochastic Activity Networks and Measurements,” in Proc. Int. Conf. on Dependable Systems and Networks (DSN-2002), pp. 551-560, IEEE CS Press, 2002.

[Cotroneo 2006] D. Cotroneo, S. Orlando, and S. Russo. Failure classification and analysis of the java virtual machine. In Proc. of 26th Intl. Conf. on Distributed Computing Systems, 2006.

[Courtney 2009] T. Courtney, S. Gaonkar, K. Keefe, E. W. D. Rozier, and W. H. Sanders. Möbius 2.3: An Extensible Tool for Dependability, Security, and Performance Evaluation of Large and Complex System Models. In Proceedings of the 39th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2009), Estoril, Lisbon, Portugal, June 29-July 2, 2009, pp. 353-358.

[Courtois 1977] P. J. Courtois, Decomposability - Queueing and Computer System Applications, New York: Academic Press, 1977.

[CRUTIAL] IST-FP6-027513 CRUTIAL - CRitical UTility InfrastructurAL resilience. (http://crutial.erse-web.it/default.asp)

[Daly 2001] D. Daly and W. H. Sanders, “A connection formalism for the solution of large and stiff models,” 34th Annual Simulation Symposium, 2001, pp. 258-265.

[DBench] DBench Project, Project funded by the European Community under the “Information Society Technology” Programme (1998-2002), http://www.dbench.org/.

[DBench 2002] DBench, Project, BDEV1: Dependability Benchmark Definition: DBench prototypes. Technical report, Dependability Benchmarking, IST-2000-2542, Deliverable BDEV1, June 2002.

http://www.dbench.org/�



[Deavours 1998a] D. D. Deavours and W.H. Sanders, “An Efficient Disk-Based Tool for Solving Large Markov Models”, Performance Evaluation, vol. 33, pp. 67-84, 1998.

[Deavours 1998b] D. D. Deavours and W.H. Sanders, “On-the-Fly’ Solution Techniques for Stochastic Petri Nets and Extensions”, IEEE Trans. Software Eng., vol. 24, no. 10, pp. 889-902, Oct. 1998.

[Desforges 1998] Desforges, M., Jacob, P., and Cooper, J. 1998. Applications of probability density estimation to the detection of abnormal conditions in engineering. In Proceedings of the Institute of the Mechanical Engineers. vol. 212. 687–703.

[Di Martino 2009] C. Di Martino. Resiliency assessment of Wireless Sensor Networks: a Holistic Approach. PhD Thesis, University of Naples Federico II. October 2009

[Dingman 1995] C. P. Dingman, J. Marshall, and D. P. Siewiorek, "Measuring Robustness of a Fault Tolerant Aerospace System", in Proceedings of the 25th IEEE International Symposium on Fault Tolerant Computing - FTCS'95, Passadena, CA, USA, 1995, pp. 522-527.

[Distefano 2008] S. Distefano and A. Puliafito, “Dependability Evaluation with Dynamic Reliability Block Diagrams and Dynamic Fault Trees”, IEEE Transactions on Dependable and Secure Computing, Vol. 5, No. 2, April-June 2008.

[Donatelli 1996] S. Donatelli and G. Franceschinis, “The PSR methodology: integrating hardware and software models,” 17th Int. Conf. on Application and Theory of Petri Nets, ICATPN '96, Osaka, Japan, 1996, (Springer-Verlag).

[Drebes 2009] R. J. Drebes, T. Nanya, “Zapmem: a Framework for Testing the Effect of Memory Corruption Errors on Operating System Kernel Reliability”, IEEE Pacific Rim International Symposium on Dependable Computing – PRDC09, Shanghai, China, 16-18 November 2009.

[Duda 2000] Duda, R. O., Hart, P. E., and Stork, D. G. 2000. Pattern Classification 2nd Ed. Wiley-Interscience.

[Dugan 1991] J. B. Dugan, “Automated Analysis of Phase-Mission Reliability,” IEEE Transaction on Reliability, vol. 40, 45-52, 1991.

[Durães 2002] J. Durães and H. Madeira. Characterization of Operating Systems Behavior in the Presence of Faulty Drivers through Software Fault Emulation. In Proc. of the European Dependable Computing Conference, 2002.

[Durães 2003] J. Durães and H. Madeira, "Multidimensional Characterization of the Impact of Faulty Device Drivers on the Operating Systems Behavior," in IEICE Transactions on Information and Systems, Special Issue on Dependability Computing, vol. E86-D, 2003, pp. 2563-2570.



[Durães 2004] J. Durães and H. Madeira. Generic Faultloads Based on Software Faults for Dependability Benchmarking. In Proc. of the Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2004.

[Durães 2006] J. Duraes and H. Madeira. Emulation of Software Faults: A Field Data Study and a Practical Approach. IEEE Transactions on Software Engineering, 32(11), 2006.

[Düssel 2010] C.S.A.J.K. Patrick Düssel, Christian Gehl, Pavel Laskov, Jens-Uwe Bußer, "Cyber-Critical Infrastructure Protection Using Real-time Payload-based Anomaly Detection", Critical Information Infrastructures Security, 2010, p. 85–97.

[Engle 1995] R. F. Engle and K. F. Kronera “Multivariate Simultaneous Generalized ARCH”, Econometric Theory, 1995.

[Ertӧz 2003] Ertӧz, L., Steinback, M., and Kumar, V. 2003. Finding topics in collections of documents: A shared nearest neighbor approach. In Clustering and Information Retrieval. 83–104.

[Eskin 2000] Eskin, E. 2000. Anomaly detection over noisy data using learned probability distributions. In Proceedings of the 17th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., 255–262.

[Eskin 2002] Eskin, E., Arnold, A., Prerau, M., Portnoy, L., and Stolfo, S. 2002. Ageometric framework for unsupervised anomaly detection. In Proceedings of the Conference on Applications of Data Mining in Computer Security. Kluwer Academics, 78–100.

[Ester 1996] Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, E. Simoudis, J. Han, and U. Fayyad, Eds. AAAI Press, 226–231.

[Eubank 1999] R. L. Eubank and E. Eubank “Non parametric regression and spline smoothing”, Marcel Dekker, 1999.

[Fabre 1999] J.C. Fabre et al., MAFALDA: Microkernel assessment by fault injection and design aid, in Proc. European Dependable Computing Conference, 1999.

[Fabre 2000] J. C. Fabre, M. Rodríguez, J. Arlat, F. Salles, and J. M. Sizun, "Building Dependable COTS Microkernel-based Systems using MAFALDA", in Proceedings of the 2000 Pacific Rim International Symposium on Dependable Computing - PRDC'00, 2000, pp. 85-92.

[Fenton 2000] N. Fenton and N. Ohlsson. Quantitative Analysis of Faults and Failures in a Complex Software System. IEEE Transactions on Software Engineering, 26(8), 2000.

[Fishwick 1993] Paul A. Fishwick. Multimodeling as a unified modeling framework. WSC ’93: Proceedings of the 25th conference on



Winter simulation, pages 580–581, 1993. [Flammini 2009] Flammini, F., Vittorini, V., Mazzocca N. & Pragliola, C., A Study on

Multiformalism Modeling of Critical Infrastructures, Lecture Notes in Computer Science, Volume 5508, pp. 336-343, 2009.

[Fonseca 2009] J. Fonseca, M. Vieira, H. Madeira, Vulnerability & Attack Injection for Web Applications, in Proc. Dependable Systems & Networks, 2009, pp. 93 – 102.

[Fota 1999] N. Fota, M. Kâaniche and K. Kanoun, “Incremental Approach for Building Stochastic Petri Nets for Dependability Modeling,” in Statistical and Probabilistic Models in Reliability, (Ionescu and Limnios, Eds.), pp. 321-335, Birkhäuser, 1999.

[Fox 1972] Fox, A. J. 1972. Outliers in time series. J. Royal Statis. Soc. Series B 34, 3, 350–363.

[FreeMODBUS] C. Walter, FreeMODBUS library, http://www.freemodbus.org/ [Fricks 1997] R. Fricks, C. Hirel, S. Wells, and K. Trivedi, “The Development of

an Integrated Modeling Environment,” in Proc. World Congress on Systems Simulation (WCSS '97), (2nd Joint Conf. of Int. Simulation Societies), Singapore, Sept. 1-3, 1997, pp. 471-476.

[Fu 1982] K. S. Fu “Syntactic pattern recognition and application”, Prentice Hall, 1982.

[Fu 2007] S. Fu and C.Z. Xu. Exploring Event Correlation for Failure Prediction in Coalitions of Clusters. Proceedings of the 2007 ACM/IEEE conference on Supercomputing, 2007.

[Gadelrab 2006] Gadelrab, M. S. & El Kalam A. A. (2006), Testing Intrusion Detection Systems: An Engineered Approach, Proceedings of the 10th IASTED International Conference SOFTWARE ENGINEERING AND APPLICATIONS, Dallas, TX, USA, 270-275.

[Ganesh 2002] J. P. Ganesh and J. B. Dugan, “Automatic Synthesis of Dynamic Fault Trees from UML System Models”, in Proc. of the IEEE Int. Symposium on Software Reliability Engineering, (ISSRE), pp 243-256, 2002.

[German 1995] R. German, C. Kelling, A. Zimmermann, and G. Hommel. TimeNET: A toolkit for evaluating non-markovian stochastic petri-nets. Performance Evaluation, 24:69–87, 1995

[Ghosh 1998] A. K. Ghosh, M. Schmid, and V. Shah, "Testing the Robustness of Windows NT Software", in Proceedings of the 9th IEEE International Symposium on Software Reliability Engineering - ISSRE'98, 1998, pp. 231-236.

[Ghosh 1999] A.K. Ghosh, and J.M. Voas, Inoculating software for survivability, Communications of the ACM, 42(7), 1999.

[Ghoting 2006] Ghoting, A., Parthasarathy, S., and Otey, M. 2006. Fast mining of distance-based outliers in high dimensional datasets. In Proceedings of the SIAM International Conference on Data Mining.



[Gönczy 2006] L. Gönczy, S. Chiaradonna, F. Di Giandomenico, A. Pataricza, A. Bondavalli and T. Bartha, “Dependability evaluation of web service-based processes”, in Proc. of European Performance Engineering Workshop (EPEW 2006), LNCS 4054, pp. 166-180, Springer, 2006.

[Graps 1995] A. Graps “An introduction to wavelets”, 1995. [Gray 1985] J. Gray. Why Do Computers Stop and What Can Be Done About

It? Technical Report TANDEM TR-85.7, 1985. [Gray 1986] J. Gray. Why do computers stop and what can be done about it.

In Proc. of Symp. on Reliability in Distributed Software and Database Systems, pages 3–12, 1986.

[Gray 1990] J. Gray. A Census of Tandem System Availability Between 1985 and 1990. IEEE Transactions on Reliability, 39(4):409–418, October 1990.

[Gribaudo 2002] Marco Gribaudo and Andras Horvath. Fluid stochastic petri nets augmented with flush-out arcs: A transient analysis technique. IEEE Trans. Softw. Eng., 28(10):944–955, 2002.

[Grottke 2010] M. Grottke, A. P. Nikora, K. S. Trivedi, “An Empirical Investigation of Fault Types in Space Mission System Software”, in Proceedings of the IEEE International Symposium on Dependable Systems and Networks - DSN'10, June 2010.

[Guha 2000] Guha, S., Rastogi, R., and Shim, K. 2000. ROCK: A robust clustering algorithm for categorical attributes. Inform. Syst. 25, 5, 345–366.

[Gursesli 2003] Gursesli, O. & Desrochers, A. A., Modelling infrastructure interdependencies using Petri nets. IEEE International Conference on Systems, Man and Cybernetics, vol.2, pp. 1506-1512, 2003.

[Haddad 2004] S. Haddad and P. Moreaux, “Approximate Analysis of Non-Markovian Stochastic Systems with Multiple Time Scale Delays,” 12th Annual Meeting of the IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) Volendam, NL, 2004.

[Han 1995] S. Han, H. A. Rosenberg, and K. G. Shin, "DOCTOR: An IntegrateD SOftware Fault InjeCTiOn EnviRonment", in Proceedings of the IEEE International Computer Performance and Dependability Symposium - IPDS'95, Erlangen, Germany, 1995, pp. 204-213.

[Hansen 1992] J. P. Hansen and D. P. Siewiorek. Models for time coalescence in event logs. In FTCS, pages 221–227, 1992.

[Harrell 2001] F. E. Harrel, “Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis”, Springer, 2001.

[Hautamaki 2004] Hautamaki, V., Karkkainen, I., and Franti, P. 2004. Outlier detection using k-nearest neighbour graph. In Proceedings of



the 17th International Conference on Pattern Recognition. vol. 3. IEEE Computer Society, 430–433.

[Haverkort 1996] B. Haverkort and I. G. Niemegeers, “Performability Modeling Tools and Techniques”, Performance Evaluation, 25(1), pages 17-40, Elsevier, 1996.

[Haverkort 2000] B. Haverkort, H. Hermanns, and J.-P. Katoen. On the use of model checking techniques for dependability evaluation. In Proc. 19th IEEE Symposium on Reliable Distributed Systems (SRDS’00), pages 228–237, Erlangen, Germany, October 2000.

[Hawkins 1980] Hawkins, D. 1980. Identification of Outliers. Chapman and Hall, London and New York.

[He 2003] He, Z., Xu, X., and Deng, S. 2003. Discovering cluster-based local outliers. Pattern Recog. Lett. 24, 9–10, 1641–1650.

[He 2005] He, Z., Xu, X., and Deng, S. 2005. An optimization model for outlier detection in categorical data. In Proceedings of the International Conference on Intelligent Computing. Lecture Notes in Computer Science, vol. 3644. Springer.

[He 2006] He, Z., Deng, S., Xu, X., and Huang, J. Z. 2006. A fast greedy algorithm for outlier mining. In Proceedings of the 10th Pacific-Asia Conference on Knowledge and Data Discovery. 567–576.

[HIDENETS 2006] HIDENETS - HIghly DEpendable ip-based NETworks and Services (Project IST-FP6-STREP-26979). http://www.hidenets.aau.dk/, 2006.

[HIDENETS D4.1.2] P. Lollini, A. Bondavalli et al. “Evaluation methodologies, techniques and tools (final version)”. EU FP6 IST project HIDENETS, deliverable D4.1.2, December 2007.

[Hillston 2005] J. Hillston. Fluid Flow Approximation of PEPA models. In Proceedings of the Second international Conference on the Quantitative Evaluation of Systems (September 19 - 22, 2005). QEST. IEEE Computer Society, Washington, DC, 2005.

[Horton 1998] G. Horton, V. G. Kulkarni, D. M. Nicol, and K. S. Trivedi. Fluid stochastic Petri nets: Theory, applications, and solution techniques. European Journal of Operational Research, Volume 105, Issue 1, 16 February 1998, Pages 184-201.

[IBM] IBM. Common event infrastructure.http://www-01.ibm.com/software/tivoli/features/cei.

[IEC61850] International Electrotechnical Commission, IEC 61850: Communication Networks and Systems in Substations, part 1 through 9, 2004.

[Ipser 1990] Edward A. Ipser, David S. Wile, and Dean Jacobs. A multi-formalism specification environment. SIGSOFT Softw. Eng. Notes, 15(6):94–106, 1990.

[Iyer 1982] R.K. Iyer, S.E. Butner, and E.J. McCluskey. A Statistical Failure/Load Relationship: Results of a Multicomputer Study.



IEEE Transactions on Computers, C-31(7):697–706, July 1982. [Jain 1988] Jain, A. K. and Dubes, R. C. 1988. Algorithms for Clustering Data.

Prentice-Hall, Inc. [Jaquith 2007] Andrew Jaquith, “Security Metrics”. Addison Wesley, 2007.

ISBN-13: 978-0-321-34998-9. [Jarboui 2002] T. Jarboui et al. Analysis of the Effects of Real and Injected

Software Faults: Linux as a Case Study. In Proc. of the European Dependable Computing Conference, 2002.

[Jiang 2001] Jiang, M. F., Tseng, S. S., and Su, C. M. 2001. Two-phase clustering process for outliers detection. Patt. Recog. Lett. 22, 6-7, 691–700.

[Jin 2006] X. Jin, J. Bigham, J. Rodaway, D. Gamez, and C. Phillips. "Anomaly detection in electricity cyber infrastructures". In Proceedings of the International Workshop on Complex Networks and Infrastructure Protection (CNIP-06), 2006.

[Kaâniche 2003] M. Kaâniche, K. Kanoun and M. Rabah, “Multi-level modelling approach for the availability assessment of e-business applications,” Software: Practice and Experience, vol. 33, no. 14, pp. 1323-1341, 2003.

[Kaâniche 2008] M. Kaâniche, P. Lollini, A. Bondavalli, and K. Kanoun. Modeling the Resilience of Large and Evolving Systems. In International Journal of Performability Engineering (editor-in-chief: Dr. Krishna B. Misra), Volume 4, Number 2, pp. 153-168, April, 2008.

[Kalakech 2004] A. Kalakech, T. Jarboui, J. Arlat, Y. Crouzet, and K. Kanoun, "Benchmarking Operating System Dependability: Windows 2000 as a Case Study", in Proc. of the IEEE Pacific Rim International Conference on Dependable Computing - PRDC'04, Tahiti, 2004.

[Kalyanakrishnam 1999a] M. Kalyanakrishnam, Z. Kalbarczyk and R.K. Iyer, “Failure Data Analysis of LAN of Windows NT Based Computers”, in Proc. 18th Int. Symposium on Reliable Distributed Systems (SRDS’99), pp. 178-187, Lausanne, Switzerland, IEEE CS Press, 1999.

[Kalyanakrishnam 1999b] M. Kalyanakrishnam, Z. Kalbarczyk, and R. K. Iyer. Failure data analysis of a LAN of windows NT based computers. In Proceedings of the Eighteenth Symposium on Reliable Distributed Systems (18th SRDS’99), pages 178–187, Lausanne, Switzerland, October 1999.

[Kanawati 1995] G. A. Kanawati, N. A. Kanawati, and J. A. Abraham, "FERRARI: A Flexible Software-Based Faut and Error Injection System," IEEE Transactions on Computers, vol. 44, pp. 248-260, 1995.

[Kanoun 2000] K. Kanoun and M. Borrel, “Fault-Tolerant System Dependability — Explicit Modeling of Hardware and Software Component-



Interactions,” IEEE Transactions on Reliability, vol. 49, no. 4, pp. 363-376, December 2000.

[Kanoun 2004] K. Kanoun et al., http://www.laas.fr/DBench, Project Reports section, project full final report, 2004.

[Kanoun 2005] C. Constantinescu, K. Kanoun, H. Madeira, B. Murphy, I. Pramanick, A.B. Brown, Dependability Benchmarking of Computing Systems, in Proc. of International Conference on Dependable Systems and Networks (DSN'05), 2005, pp. 400-410.

[Kao 1993] W.I. Kao, and R.K. Iyer, and D. Tang, FINE: A fault injection and monitoring environment for tracing the unix system behavior under faults, IEEE Transactions on Software Engineering, 19(11), 1993.

[Kao 1995] W.-l. Kao and R. K. Iyer, "DEFINE: A Distributed Fault Injection and Monitoring Environment", in Proceedings of the Workshop on Fault-Tolerant Parallel and Distributed Systems 1994, 1995, pp. 252-259.

[Kemeney 1960] J.G. Kemeney and J.L. Snell, “Finite Markov Chains”, D. Van Nostrand Company, Inc., 1960.

[Keogh 2004] Keogh, E., Lonardi, S., and Ratanamahatana, C. A. 2004. Towards parameter-free data mining. In Proceedings of the 10th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, 206–215.

[Kiciman 2005] E. Kiciman and A. Fox. Detecting application-level failures in component-based internet services. IEEE Trans. on NN, 16(5):1027–1041, September 2005.

[Kiezun 2009] A. Kiezun, P. J. Guo, K. Jayaraman, M. D. Ernst, Automatic Creation of SQL Injection and Cross-Site Scripting Attacks, in Proceedings of the 31st International Conference on Software Engineering, 2009.

[Killijian 2009] M.-O. Killijian, M. Roy, G. Séverac, and C. Zanon. “Data Backup for Mobile Nodes: A cooperative middleware and experimentation plat-form”. Workshop on Architecting Dependable Systems, Supplemental Volume of the 2009 IEEE/IFIP International Conference on Dependable Systems and Networks (DSN-2009), Portugal, 2009.

[Klemm 2001] A. Klemm, C. Lindemann, and M. Lohmann. Traffic modeling and characterization for UMTS networks.” In Global Telecommunications Conference, 2001. GLOBECOM '01. IEEE, vol. 3, pp. 1741–1746 vol.3, 2001.

[Knorr 1997] Knorr, E. M. and Ng, R. T. 1997. A unified approach for mining outliers. In Proceedings of the Conference of the Centre for Advanced Studies on Collaborative Research. IBM Press, 11.

[Knorr 1998] Knorr, E. M. and Ng, R. T. 1998. Algorithms for mining distance-



based outliers in large datasets. In Proceedings of the 24rd International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., 392–403.

[Knorr 1999] Knorr, E. M. and Ng, R. T. 1999. Finding intensional knowledge of distance-based outliers. VLDB J. 211–222.

[Knorr 2000] Knorr, E. M., Ng, R. T., and Tucakov, V. 2000. Distance-based outliers: Algorithms and applications. VLDB J. 8, 3-4, 237–253.

[Koopman 2000] P. Koopman and J. DeVale, "The Exception Handling Effectiveness of POSIX Operating Systems," IEEE Transactions on Software Engineering, vol. 26, pp. 837-848, 2000.

[Kou 2006] Kou, Y., Lu, C.-T., and Chen, D. 2006. Spatial weighted outlier detection. In Proceedings of the SIAM Conference on Data Mining.

[Kovács 2008] M. Kovács, P. Lollini, I. Majzik and A. Bondavalli. An integrated framework for the dependability evaluation of distributed mobile applications. In Proc. of the RISE/EFTS Joint International Workshop on Software Engineering for REsilieNt systEms (SERENE 2008), pages 29-38, Newcastle upon Tyne, UK, November 17-19, 2008.

[Krings 2003] Krings, A. and P. Oman (2003). A Simple GSPN for Modeling Common Mode Failures in Critical Infrastructures. Hawaii International Conference on System Sciences 9, 334.

[Kropp 1998] N.P. Kropp, and P.J. Koopman, and D.P. Siewiorek, Automated robustness testing of off-the-shelf software components, in Proc. Annual International Symposium on Fault-Tolerant Computing, 1998.

[Laplace 1999] J.-C. Laplace and M. Brun. Critical software for nuclear reactors: 11 years of field experience analysis. In Proceedings of the Ninth International Symposium on Software Reliability Engineering, pages 364–368, Paderborn, Germany, November 1999.

[Lee 1995] I. Lee and R. K. Iyer, "Software Dependability in the Tandem GUARDIAN System," in IEEE Transactions on Software Engineering, vol. 21, 1995, pp. 455-467.

[Lee 2001] Lee, W. and Xiang, D. 2001. Information-theoretic measures for anomaly detection. In Proceedings of the IEEE Symposium on Security and Privacy. IEEE Computer Society, 130.

[Lenglet 2004] R. Lenglet, T. Coupaye, E. Bruneton. Composing Transformations of Compiled Java Programs with Jabyce. ComSIS Vol. 1, No. 2, November 2004.

[Li 1993] Li,M. Andvitanyi, P.M.B. 1993. An Introduction to Kolmogorov Complexity and Its Applications. Springer- Verlag.

[Liang 2006] Y. Liang, Y. Zhang, A. Sivasubramaniam, M. Jette, and R. K. Sahoo. Bluegene/L failure analysis and prediction models. In



Proceedings 2006 International Conference on Dependable Systems and Networks (DSN 2006), pages 425–434, Philadelphia, Pennsylvania, USA, June 2006.

[Lie 2007] D. Lie and M. Satyanarayanan, The Strength of Security Systems, in Proc. of the 2nd USENIX workshop on Hot topics in security, 2007, pp. 41-47.

[Lilja 2000] D. J. Lilja, “Measuring computer performance. A practitioner’s guide”, Cambridge University Press, 2000.

[Lim 2008] C. Lim, N. Singh, and S. Yajnik. A log mining approach to failure analysis of enterprise telephony systems. In International Conference on Dependable Systems and Networks (DSN 2008), Anchorage, Alaska, June 2008.

[Lindemann 1999] C. Lindemann, A. Reuys, and A. Thümmler. The DSPNexpress 2.000 performance and dependability modeling environment. In Proc. of the 29th Annual Int. Symp. on Fault-Tolerant Computing, pages 228-231, Madison, Wisconsin, USA, June 1999.

[Ling 1997] S. Ling and W. K. Li “On fractionally integrated autoregressive moving-average time series models with conditional heteroskedasticity”, Journal of the American Statistical Association, 1997.

[Lippmann 2000] Lippmann, Haines, Fried, Korba and Das, The 1999 DARPA off-line intrusion detection evaluation, Computer Networks, 2000.

[Lollini 2005] P. Lollini, A. Bondavalli and F. Di Giandomenico, “A modeling methodology for hierarchical control systems and its application,” Journal of the Brazilian Computer Society, vol. 10, no. 3, pp. 57-69, 2005.

[Lollini 2009] P. Lollini, A. Bondavalli and F. di Giandomenico, "A Decomposition-Based Modeling Framework for Complex Systems," Reliability, IEEE Transactions on, vol.58, no.1, pp. 20-33, March 2009.

[Lonvick 2001] C. Lonvick. The bsd syslog protocol. Request for Comments 3164, The Internet Society, Network Working Group, RFC3164, August 2001.

[Lu 2002] N. Lu, J.H. Chow, A.A. Desrochers, “A multi-layer Petri net model for deregulated electric power systems,” American Control Conference, 2002. Proceedings of the 2002 , vol.1, no., pp. 513- 518 vol.1, 2002.

[Madeira 2000] H. Madeira, D. Costa, and M. Vieira. On the Emulation of Software Faults by Software Fault Injection. In Proc. of the Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2000.

[Madeira 2002] H. Madeira, R. Some, F. Moreira, D. Costa, and D. Rennels, "Experimental Evaluation of a COTS System for Space



Applications", in Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, DSN-02, Bethesda, Maryland, USA, 2002.

[Magyar 2009] M. Magyar and I. Majzik, “Modular Construction of Dependability Models from System Architecture Models: A Tool-Supported Approach”, QEST 2009, 6th Int. Conf. on the Quantitative Evaluation of Systems, Budapest, Hungary, IEEE CS, Los Alamitos, pp 95-96, 2009.

[Mahoney 2003] Mahoney, M. V., Chan, P. K., and Arshad, M. H. 2003. A machine learning approach to anomaly detection. Tech. rep. CS–2003–06, Department of Computer Science, Florida Institute of Technology Melbourne.

[Mainkar 1996] V. Mainkar and K. Trivedi, “Sufficient Conditions for Existence of a Fixed Point in Stochastic Reward Net-Based Iterative Models”, IEEE Trans. Software Eng., vol. 22, no. 9, pp. 640-653, Sept. 1996.

[Martins 2002] E. Martins, C. M. F. Rubira, and N. G. M. Leme, "Jaca: A reflective fault injection tool based on patterns", in Proceedings of the IEEE International Dependable Systems and Networks - DSN'02, Bethseda, USA, 2002.

[Massie 2004] M. L. Massie, B. N. Chun, and D. E. Culler. The Ganglia Distributed Monitoring System: Design, Implementation, and Experience. Paral lel Computing, (7), July 2004.

[McCallum 2000] McCallum, A., Nigam, K., and Ungar, L. H. 2000. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, 169–178.

[McHugh 2000] John McHugh, Testing Intrusion Detection Systems: A Critique of the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory, Proc. ACM TISSEC 3(4) 262-294, 2000.

[Meyer 1993] J. F. Meyer and W. H. Sanders, “Specification and Construction of Performability Models,” in Int. Workshop on Performability Modeling of Computer and Communication Systems, Mont Saint Michel, France, 1993, pp. 1-32.

[Milner 1989] R. Milner, Communication and Concurrency, Prentice Hall, 1989. [Montecchi 2011a] L. Montecchi, P. Lollini, A. Bondavalli, “Dependability Concerns

in Model-Driven Engineering”, to appear in WORNUS 2011: 2nd IEEE International Workshop on Object/component/service-oriented Real-time Networked Ultra-dependable Systems, March 2011.

[Montecchi 2011b] L. Montecchi, P. Lollini and A. Bondavalli. Towards a MDE Transformation Workflow for Dependability Analysis. To



appear in Proc. of the 16th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS 2011), Las Vegas, USA, 27-29 April 2011.

[Moraes 2006] M. Moraes et al. Injection of Faults at Component Interfaces and Inside the Component Code: Are They Equivalent? In Proc. of the European Dependable Computing Conference, 2006.

[Moraes 2007] M. Moraes et al. Experimental Risk Assessment and Comparison Using Software Fault Injection. In Proc. of the Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2007.

[Mura 1999] I. Mura, A. Bondavalli, X. Zang and K. Trivedi, “Dependability Modelling and Evaluation of Phased Mission Systems: a DSPN Approach,” 7th IFIP Int. Conference on Dependable Computing for Critical Applications (DCCA-7), San Jose, CA, USA, 1999, (IEEE Computer Society).

[Mura 2001] I. Mura and A. Bondavalli, “Markov Regenerative Stochastic Petri Nets to Model and Evaluate the Dependability of Phased Missions,” IEEE Transactions on Computers, vol. 50, no. 12, pp. 1337-1351, 2001.

[Murphy 2000] B. Murphy and B. Levidow. Windows 2000 Dependability. MSR-TR-2000-56, Microsoft Research, Microsoft Corporation, Redmond, WA, June 2000.

[Nassu 2008] B. T. Nassu, K. Uehara, T. Nanya, “Injecting Inconsistent Values Caused by Interaction Faults for Experimental Dependability Evaluation” IEEE Pacific Rim International Symposium on Dependable Computing – PRDC08, 15-17 December 2008.

[Nelli 1996] M. Nelli, A. Bondavalli and L. Simoncini, “Dependability Modeling and Analysis of Complex Control Systems: an Application to Railway Interlocking”, EDCC-2 European Dependable Computing Conference, Lecture Notes in Computer Science N. 1150. Taormina, Italy, Springer- Verlag: 93-110, 1996.

[Neves 2006] Neves, N., Antunes, J., Correia, M., Veríssimo, P., Neves R., “Using Attack Injection to Discover NewVulnerabilities”, IEEE/IFIP International Conference on Dependable Systems and Networks, 2006.

[Ng 1996] W. T. Ng, C. M. Aycock, and P. M. Chen, "Comparing Disk and Memory's Resistance to Operating System Crashes", in Proceedings of the 7th IEEE International Symposium on Software Reliability Engineering, ISSRE'96, New York, NY, USA, 1996.

[Ng 1999a] Ng, R. T. Andhan, J. 1994. Efficient and effective clustering methods for spatial data mining. In Proceedings of the 20th International Conference on Very Large Data Bases. Morgan



Kaufmann Publishers Inc., 144–155. [Ng 1999b] W. T. Ng and P. M. Chen, "Systematic Improvement of Fault

Tolerance in the RIO File Cache", in Proceedings of the 29th IEEE International Fault Tolerant Computing Symposium, FTCS-29, Madison, WI, USA, 1999.

[Ng 2001] W. T. Ng and P. M. Chen, "The Design and Verification of the Rio File Cache," IEEE Transactions on Computers, vol. 50, pp. 322-332, 2001.

[Nicol 2004] D. M. Nicol, W. H. Sanders and K. S. Trivedi, “Model-based Evaluation: From Dependability to Security”, IEEE Transactions on Dependable and Secure Computing, Vol. 1, No. 1, pp 48-65, 2004.

[Obal 1998] W. D. Obal, Measure-Adaptive State-Space Construction Methods, PhD, University of Arizona, 1998.

[Odin 2000] Odin, T. and Addison, D. 2000. Novelty detection using neural network technology. In Proceedings of the COMADEN Conference.[Oehlert 2005] Peter Oehlert, “Violating Assumptions with Fuzzing”, IEEE Security & Privacy, Pgs 58-62, March/April 2005.

[Oliner 2007] A. J. Oliner and J. Stearley. What supercomputers say: A study of five system logs. In Proceedings of the International Conference on Dependable Systems and Networks (DSN 2007), pages 575–584. IEEE Computer Society, 2007.

[Oman 2007] Paul Oman, Matthew Phillips, Intrusion Detection and Event Monitoring in SCADA Networks, book chapter of Critical Infrastructure Protection, Pages 161-173, Springer Boston, 2007.

[OMG 2008] Object Management Group, “UML for Modeling Quality of Service and Fault Tolerance Characteristics and Mechanisms, v1.1”, http://www.omg.org/spec/QFTP/1.1, 2008.

[Oppenheimer 2003] D. Oppenheimer, and A. Ganapathi, and D.A. Patterson, Why do Internet services fail, and what can be done about it?, in Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems, 2003.

[Otey 2003] Otey, M., Parthasarathy, S., Ghoting, A., Li, G., Narravula, S., and Panda, D. 2003. Towards NIC-based intrusion detection. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, 723–728.

[Otey 2006] Otey, M. E., Ghoting, A., and Parthasarathy, S. 2006. Fast distributed outlier detection in mixed-attribute data sets. Data Min. Knowl. Disc. 12, 2-3, 203–228.

[Palshikar 2005] Palshikar,G.K. 2005. Distance-based outliers in sequences. Lecture Notes in Computer Science, vol. 3816, 547–552.

[Papadimitriou 2002] Papadimitriou, S., Kitagawa, H., Gibbons, P. B., and Faloutsis, C.



2002. Loci: Fast outlier detection using the local correlation integral. Tech. rep. IRP-TR-02-09, Intel Research Laboratory.

[Park 2006] K. Park and V. S. Pai. CoMon: a mostly-scalable monitoring system for PlanetLab. SIGOPS OSR, 40(1), 2006.

[Parzen 1962] Parzen, E. 1962. On the estimation of a probability density function and mode. Annals Math. Stat. 33, 1065–1076.

[Paxson 1998] V. Paxson. Bro: a system for detecting network intruders in real-time. In Proc. of USENIX Security Symposium, pages 31–51, 1998.

[Percival 2000] D. B. Percival and A. T. Walden “Wavelet methods for time series analysis”, Cambridge University Press, 2000.

[Perry 1985] D.E. Perry, and W.M. Evangelist, An empirical study of software interface faults, in Proc. of the International Symposium on New Directions in Computing, 1985.

[Perry 1993] D.E. Perry, and C. Stieg, Software faults in evolving a large, real-time system: a case study, Software Engineering — ESEC '93, Lecture Notes in Computer Science, 717, 1993.

[Pires 2005] Pires, A. and Santos-Pereira, C. 2005. Using clustering and robust estimators to detect outliers in multivariate data. In Proceedings of the International Conference on Robust Statistics.

[Poirier 1973] D. J. Poirier, “Piecewise Regression Using Cubic Spline”, Journal of American Statistical Association, 1973.

[Pokrajac 2007] Pokrajac, D., Lazarevic, A., and Latecki, L. J. 2007. Incremental local outlier detection for data streams. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining.

[Pooley 1991] R. J. Pooley, “The Integrated Modelling Support Environment: A New Generation of Performance Modelling Tools,” in Computer Performance Evaluation: Modelling Techniques and Tools: Proc. of the Fifth Int. Conf. on Modelling Techniques and Tools for Computer Performance Evaluation, Torino, Italy, February 13-15, 1991 (ed. by G. Balbo and G. Serazzi), Amsterdam: Elsevier, 1992, pp. 1-15.

[Powell 2003] Powell, D., Stroud, R., “Conceptual Model and Architecture of MAFTIA”, Project MAFTIA, deliverable D21, 2003.

[Praehofer 1990] Herbert Praehofer and Bernard P. Zeigler. Modelling and simulation of nonhomogeneous models. EUROCAST ’89: Selection of Papers from the International Workshop on Computer Aided Systems Theory, pages 200–211, 1990.

[PRIN-DOTS-LCCI-D1.1 2010] C. Esposito, S. Russo, M. Platania, R. Baldoni, P. Lollini, A. Bondavalli, M. Ficco, L. Romano, A. Bovenzi, R. Lancellotti, M. Marchetti and M. Colajanni. “Requirements Analysis for LCCI”. Progetto di ricerca PRIN DOTS-LCCI: Dependable Off-The- Shelf



based middleware systems for Large-scale Complex Critical Infrastructures, Deliverable D1.1, Nov. 2010.

[Ramaswamy 2000] Ramaswamy, S., Rastogi, R., and Shim, K. 2000. Efficient algorithms for mining outliers from large data sets. In Proceedings of the ACMSIGMOD International Conference on Management of Data. ACM Press, 427–438.

[Ratsch 2002] Ratsch, G.,Mika, S., Scholkoff, B., and Muller, K.-R. 2002. Constructing boosting algorithms from SVMS: An application to one-class classification. IEEE Trans. Patt. Anal. Mach. Intel. 24, 9, 1184–1199.

[Reibman 1991] A. Reibman, and M. Veeraraghavan, “Reliability Modeling: An Overview for System Designers”, IEEE Computer, April, pp.49-57, 1991.

[Rojas 1996] I. Rojas, “Compositional Construction of SWN Models,” The Computer Journal, vol. 38, no. 7, pp. 612-621, 1996.

[Roosta 2008] T. Roosta, D.K. Nilsson, U. Lindqvist, and A. Valdes, An intrusion detection system for wireless process control systems, IEEE, 2008.

[Roth 2004] Roth, V. 2004. Outlier detection with one-class kernel Fisher discriminants. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).

[Roth 2006] Roth, V. 2006. Kernel fisher discriminants for outlier detection. Neural Comput. 18, 4, 942–960.

[Rouillard 2004] J. P. Rouillard. Real-time log file analysis using the simple event correlator (sec). USENIX Systems Administration (LISA XVIII) Conference Proceedings, Nov. 2004.

[Rousseeuw 1987] Rousseeuw, P. J. and Leroy, A. M. 1987. Robust Regression and Outlier Detection. John Wiley & Sons, Inc.

[Rrushi 2008] Julian Rrushi and Roy Campbell, Detecting Attacks in Power Plant Interfacing Substations through Probabilistic Validation of Attack Effect Bindings, in Proceeding of S4: SCADA Security Scientific Symposium, Miami, FL, January 2008

[Rugina 2007] A. E. Rugina, K. Kanoun and M. Kaâniche, “A System Dependability Modeling Framework using AADL and GSPNs”, in Architecting Dependable Systems IV, (R. d. L. e. al., Ed.) vol. LNCS 4615, pp. 14-38, Springer-Verlag, 2007.

[SAE 2006] SAE-AS5506/1, “Architecture Analysis and Design Language (AADL) Annex Volume 1, Annex E: Error Model Annex”, http://standards.sae.org/as5506/1/, Society of Automotive Engineers, 2006

[Sahner 1996] R.A. Sahner, K.S. Trivedi and A. Puliafito, “Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package”, Kluwer Academic Publishers, 1996.



[Sahoo 2004] R. K. Sahoo, A. Sivasubramaniam, M. S. Squillante, and Y. Zhang. Failure data analysis of a large-scale heterogeneous server environment. In Proceedings 2004 International Conference on Dependable Systems and Networks (DSN 2004), pages 772–, Florence, Italy, June- July 2004.

[Salfner 2004] F. Salfner, S. Tschirpke, and M. Malek. Comprehensive logfiles for autonomic systems. Proc. of the IEEE Parallel and Distributed Processing Symposium, 2004, April 2004.

[Sanders 1995] W. H. Sanders, W. D. Obal II, M. A. Qureshi, and F. K. Widjanarko. The UltraSAN modeling environment. Performance Evaluation, 24(1):89–115, 1995

[Sanders 1999] W. H. Sanders, “Integrated frameworks for multi-level and multi-formalism modeling”, in Proc. the 8th International Workshop on Petri Nets and Performance Models, pages 2-9, September 1999.

[Sanders 2001] W. H. Sanders and J. F. Meyer, “Stochastic activity networks: Formal definitions and concepts”, in Lecture Notes in Computer Science, pages 315-343. Springer-Verlag, 2001.

[Schӧlkopf 2001] Schӧkopf, B., Platt, J. C., Shawe-Taylor, J. C., SMOLA, A. J., AND WILLIAMSON, R. C. 2001. Estimating the support of a high-dimensional distribution. Neural Comput. 13, 7, 1443–1471.

[Schmidt 2006] D. C. Schmidt, “Model-Driven Engineering”, IEEE Computer 39 (2), February 2006.

[Schroeder 2006] B. Schroeder and G. A. Gibson. A large-scale study of failures in high-performance computing systems. In DSN, pages 249– 258. IEEE Computer Society, 2006.

[Segall 1988] Z. Segall, D. Vrsalovic, D. Siewiorek, J. Kownacki, J. Barton, R. Dancey, A. Robinson, and T. Lin, "FIAT - Fault Injection Based Automated Testing Environment", in Proceedings of the 18th IEEE International Symposium on Fault Tolerant Computing - FTCS'88, 1988, pp. 102-107.

[Sheikholeslami 1998] SheikholeslamiI, G., Chatterjee, S., and Zhang, A. 1998. Wavecluster: A multi-resolution clustering approach for very large spatial databases. In Proceedings of the 24rd International Conference on Very Large Databases. Morgan Kaufmann Publishers Inc., 428–439.[SIEM] www.arcsight.com

[Silva 2008] L. Silva. Comparing Error Detection Techniques for Web Applications: An Experimental Study. 7th IEEE Intl. Symp. on Network Computing and Applications, 2008.

[Simache 2005] C. Simache and M. Kaaniche. Availability assessment of sunOS/solaris unix systems based on syslogd and wtmpx log files: A case study. In PRDC, pages 49–56. IEEE Computer Society, 2005.

[Skarin 2008] D Skarin, J. Karlsson, “Software Implemented Detection and

http://www.arcsight.com/�



Recovery of Soft Errors in a Brake-by-Wire System”, Proc. of the Seventh European Dependable Computing Conference – EDCC-7, 7-9 May 2008.

[Skarin 2010] D. Skarin, R. Barbosa, J. Karlsson, “GOOFI-2: A Tool for Experimental Dependability Assessment”. in Proceedings of the IEEE International Symposium on Dependable Systems and Networks - DSN'10, June 2010.

[Smith 2002] Smith, R., Bivens, A., Embrechts, M., Palagir, C., and Szymanski, B. 2002. Clustering approaches for anomaly-based intrusion detection. In Proceedings of the Intelligent Engineering Systems through Artificial Neural Networks. ASME Press, 579–584.

[Snort] www.snort.com [Somani 1994] A. Somani and K. Trivedi, “Phased-mission System Analysis

using Boolean Algebraic Methods,” in 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, Nashville, Tennessee, USA, 1994, pp. 98-107.

[Sommers 2005] Sommers, J., Yegneswaran, V. & Barford, P. (2005), Toward Comprehensive Traffic Generation for Online IDS Evaluation, Technical report, Department of Computer Science, University of Wisconsin, Madison.

[Song 2007] Song, X., Wu, M., Jermain, C., and Ranka, S. 2007. Conditional anomaly detection. IEEE Trans. Knowl. Data Eng. 19, 5, 631–645.

[Spreen 1979] T. H. Spreen and R. E. Mayer and J. R. Simpson and J. T. McClave “Forecasting Monthly Slaughter Cow Prices with a subset Autoregressive Model”, Southern Journal of Agricultural Economics, 1979.

[Stefano 2000] Stefano, C., Sansone, C., and Vento, M. 2000. To reject or not to reject: that is the question: An answer in the case of neural classifiers. IEEE Trans. Syst. Manag. Cyber. 30, 1, 84–94.

[Sullivan 1991] M. Sullivan and R. Chillarege. Software Defects and their Impact on System Availability—A Study of Field Failures in Operating Systems. In Proc. of Annual Symposium on Fault Tolerant Computing, 1991.

[Sullivan 1992] M. Sullivan and R. Chillarege, "A comparison of Software Defects in Database Management Systems and Operating Systems", in Proceedings of the IEEE 22nd International Symposium on Fault Tolerant Computing - FTCS'92, 1992, pp. 475-484.

[Sun 2004] Sun, P. and Chawla, S. 2004. On local spatial outliers. In Proceedings of the 4th IEEE International Conference on Data Mining. 209–216.

[Sun 2006] Sun, P. and Chawla, S. 2006. SLOM: A new measure for local spatial outliers. Knowl. Inform. Syst. 9, 4, 412–429.

[Tan 2005] Tan, P.-N., Steinback, M., and Kumar, V. 2005. Introduction to



Data Mining. Addison-Wesley. [Tang 1993] D. Tang and R.K. Iyer. MEASURE+ - A Measurement-Based

Dependability Analysis Package. Proc. of the ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, 1993.

[Tang 1998] D. Tang, M. Hecht, J. Miller, and J. Handal. Meadep: A dependability evaluation tool for engineers. IEEE Transactions on Reliability, pages vol. 47, no. 4 (December), pp. 443–450, 1998.

[Tang 2002] Tang, J., Chen, Z., Chee Fu, A. W., and W.Cheung, D. 2002. Enhancing effectiveness of outlier detections for low density patterns. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 535–548.

[Tao 2006] Tao, Y., Xiao, X., and Zhou, S. 2006. Mining distance-based outliers from large databases in any metric space. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, 394–403.

[Ten 2008] Ten, C.-W., C.-C. Liu, and G. Manimaran (2008). Vulnerability Assessment of Cybersecurity for SCADA Systems. IEEE Transactions on Power Systems 23 (4), 1836.

[Teng 1990] Teng, H., Chen, K., and Lu, S. 1990. Adaptive real-time anomaly detection using inductively generated sequential patterns. In Proceedings of the IEEE Computer Society Symposium on Research in Security and Privacy. IEEE Computer Society Press, 278–284.

[Thakur 1996] A. Thakur and R. K. Iyer. Analyze-now an environment for collection and analysis of failures in a networked of workstations. IEEE Transactions on Reliability, pages Vol. 45, no. 4,560–570, 1996.

[Thomas 2008] T. E. Hart, M. Chechik, Security Benchmarking using Partial Verification, in Proc. of the 3rd conference on Hot topics in security (HOTSEC'08), 2008. pp. 1-6.

[Torr 1993] Torr, P. and Murray, D. 1993. Outlier detection and motion segmentation. In Proceedings of the SPIE. Sensor Fusion VI, S. Schenker, Ed. vol. 2059. 432–443.[Tran 2004] N. Tran “Automatic ARIMA Time Series Modeling for Adaptive I/O Prefetching”, Tran. on Parallel and Distributed Systems, 2004.

[Trivedi 1996] K. S. Trivedi, S. Hunter, S. Garg and R. Fricks, “Reliability Analysis Techniques Explored Through a Communication Network Example”, Technical Report TR-96/32, Duke University, Dep. of Electrical and Computer Eng., USA, 1996.

[Trivedi 2001] K. S. Trivedi, “Probability and Statistics with Reliability, Queuing, and Computer Science Applications”, John Wiley and Sons, New York, 2001.



[Tsai 1995] T. K. Tsai and R. K. Iyer, "Measuring Fault Tolerance with the FTAPE Fault Injection Tool", in Proceedings of the 8th International Conference on Modeling Techniques and Tools for Computer Performance Evaluation, Heidelberg, Germany, 1995, pp. 26-40.

[Tsai 1996] T. K. Tsai, R. K. Iyer, and D. Jewitt, "An Approach to Benchmarking of Fault-Tolerant Commercial Systems", in Proceedings of the 26th IEEE Inernational Fault Tolerant Computer Symposium, FCTS-26, Sendai, Japan, 1996, pp. 314-323.

[Tsai 2000] T. Tsai and N. Singh, "Reliability Testing of Applications on Windows NT", in Proceedings of the IEEE International Symposium on Dependable Systems and Networks - DSN'00, New York, NY, USA, 2000, pp. 427-436.

[uClinux] K. Albanowski, and D.J. Dionne, Embedded Linux Microcontroller Project, http://www.uclinux.org

[Vaarandi 2002] R. Vaarandi. Sec - a lightweight event correlation tool. In IEEE IPOM’02 Proceedings, 2002.

[Valdes 2000] A. Valdes and K. Skinner. Adaptive, model-based monitoring for cyber attack detection. In H. Debar, L. Me, and F. Wu, editors, Recent Advances in Intrusion Detection (RAID 2000), LNCS, Toulouse, France, Oct. 2000.

[Valdes 2006] A. Valdes, M. Fong, and K. Skinner. Data cube indexing of large infosec repositories. In AusCERT Asia Pacific Information Technology Security Conference, May 2006.

[Valdes 2009a] A. Valdes and S. Cheung, "Communication pattern anomaly detection in process control systems", Technologies for Homeland Security, 2009. HST’09. IEEE Conference on, IEEE, 2009, p. 22–29.

[Valdes 2009b] A. Valdes and S. Cheung, "Intrusion monitoring in process control systems", System Sciences, 2009. HICSS'09. 42nd Hawaii International Conference on, IEEE, 2009, p. 1–7.

[van Moorsel 1998] A. P. A. van Moorsel and Y. Huang, “Reusable Software Components for Performability Tools and Their Utilization for Web-based Configurable Tools,” in Computer Performance Evaluation: Lecture Notes in Computer Science No. 1469, Berlin: Springer, 1998, pp. 37-50.

[Vapnik 1995] Vapnik, V. N. 1995. The Nature of Statistical Learning Theory. Springer-Verlag.

[Vieira 2003] M. Vieira and H. Madeira. A Dependability Benchmark for OLTP Application Environments. In Proc. of the International Conference on Very Large Data Bases, 2003.

[Vittorini 2002] V. Vittorini, G. Franceschinis, M. Gribaudo, M. Iacono, N. Mazzocca, DrawNet++: Model Objects to Support Performance

http://www.uclinux.org/�



Analysis and Simulation of Complex Systems, In: Lecture Notes in Computer Science (LNCS), vol. 2324, Computer Performance Evaluation - Modelling Techniques and Tools, pp. 233-238, Springer-Verlag, 2002.

[Wei 2003] Wei, L., Qian, W., Zhou, A., and Jin, W. 2003. Hot: Hypergraph-based outlier test for categorical data. In Proceedings of the 7th Pacific-Asia Conference on Knowledge and Data Discovery. 399–410.

[Winer 1964] N. Winer, “Extrapolation, Interpolation, and Smoothing of stationary Time Series”, 1964.

[WirelessHART] WirelessHART Foundation. www.hartcomm2.org, 2007. [Wolberg 1999] G. Wolberg and I. Alfy, “Monotonic Cubic Spline Interpolation”,

Proc. of International Conference on Computer Graphics, 1999. [Yu 2002] YU, D., SHEIKHOLESLAMI, G., AND ZHANG, A. 2002. Findout:

Finding outliers in very large datasets. Knowl. Inform. Syst. 4, 4, 387–412.

[Yu 2006] Yu, J. X., Qian,W., Lu, H., and Zhou, A. 2006. Finding centric local outliers in categorical/numerical spaces. Knowl. Inform. Syst. 9, 3, 309–338.

[Zhang 2006] Zhang, J. and Wang, H. 2006. Detecting outlying subspaces for high-dimensional data: The new task, algorithms, and performance. Knowl. Inform. Syst. 10, 3, 333–355.

[Zhu 2008] Bonnie Zhu, Anthony Joseph and Shankar Sastry, Taxonomy of Cyber Attacks on SCADA Systems, 2008.

dots-lcci deliverable d3.1 v2 0 final

Documents