[ieee communication technologies, research, innovation, and vision for the future (rivf) - hanoi,...

Securing a Critical InfrastructureLoïc Baud, Patrick Bellot

Institut Telecom, Télécom ParisTech & LTCI CNRS46 rue Barrault, 75634 Paris Cedex 13, France

{baud,bellot}@telecom-paristech.fr

Abstract—In this article we present a mechanism of protectionand reconfiguration of a critical infrastructure. In order toreact in real time to various threats and challenges occuring,this mechanism is distributed and fully automated, no humanintervention is required. This mechanism can then be seen asthe immune system of the critical infrastructure on which it isdeployed. The immune system has agents that collect data onvarious different entities infrascture review, analyze and deducethe Security Assurance values of these entities and commits animmune response (reconfiguration, isolation of the compromisedparts, etc) upon detection of anomalies or threats. This immunesystem is also able to react to and tolerate failures or attackssince its agents move upon an resilient overlay network.

Index Terms—Critical Infrastructure Security, Immune sys-tem, Security Assurance, Overlay Network

I. INTRODUCTION

The constant and ever increasing computerization of today’sworld led to the emergence of critical infrastructures (CI).A CI is an autonomous interconnections of network devicesand services that reconfigures itself according to its needs andaccording to the actual context in which it evolves. From thispoint of view a large and complex CI can be seen as a livingorganism: the physical devices such the ethernet links andthe routing devices compose the skeleton of the CI and theapplication services offered the CI compose its body/musclewhile the servers or any network device with storage capacityare the memory of the CI and the inner working workservicesand protocol are its intelligence. Like any living organism a CIcan be subject to attack. For example, an application servicecan be DDoS attacked and can be made unavailable, this canbe likened to a broken limb and the devices of the CI canbe infected by a virus that will use the CI ressources forits own. Moreover, a CI is interconnected and interact withothers CIs, the immune system of the CI must also integratethe interactions.

In the past, the defense of a CI was confided to humansystem administrators that defined a set of exceptions andsecurity policies. Unfortunately, this approach cannot be ap-plied nowaday. The CIs are too large and composed by toomany different kinds of devices and services to be whollyunderstable by any human. The immune system of the CI mustbe a smart, accurate, very reactive and autonomous servicethat becomes a part of the CI itself. As a part of the CI, thisimmune system must also protect itself and it must be the lastto fall in case of CI attacks or malfunctions.

In this paper we propose a distributed network securitymonitoring tool that aims to be used as the immune system

of a CI. Th proposed immune system is composed of twocomponent:

• the distributed monitoring component based on a set ofsensor agents, computation agents and security agents.

• a resilient overlay network that acts as a resistant platformfor supporting the monitoring component and that routesthe messages exchanged by the different agents of theimmune system.

II. OVERVIEW

The security network monitoring tool consists in a dis-tributed set of multi-technology sensor agent, a set of com-putation agent and a set of detection mechanisms to detectattacks, failures or services bugs that can occur in the system.When such an incident is detected, the immune system mustrespond in a quick and appropriate way according to a securitypolicy. The security policy may imply system and servicesreconfiguration. It has to perform 3 tasks that are:

• Modelling. This task consists in planning and definingthe optimal operational configuration of the CI.

• Detection and Prevention. To obtain the current opera-tional mode of the CI, some sensors must measure thecharacteristic of the CI. The sensors must detect theforerunners of a failure or an attack.

• Reaction. Some computer-aided and automated counter-measures initiatives have to be taken when a failure oran attack is detected. These responses must be quick andappropriate to the kind of detected incidents.

To tolerate failures ans attacks, the sensor agents mustsend the collected data to the computation agents and thecomputation agents must communicate through a resilientnetwork to be able to compute the SA values of the differententities of the CI. For this reason a resilient overlay networkcalled ROSA was chosen to support the immune system.

A. Description

A set of dedicated sensor agents are located on everymeasurable entities of the network. The sensors may measurethe load of the processor, the disk usage, the use of networkinterfaces, the output of anti-virus softwares, the errors in theservices log files and so on. The computation agents of theimmune system uses the sensor agents output data to computea Security Assurance (SA) Values as described in Section IV.

Once the SA values are computed the Security Policy hasto be applied by the security agents. Nowadays the SecurityPolicy consists in two rules that are:

978-1-4244-8075-3/10/$26.00 ©2010 IEEE

• If the SA value of a routing entity (router or a gate-way) goes under a given threshold, then the computationagents try to find an alternative routing entity for all thesubnetworks that depends of the entity with a low SAvalue. If such alternative routing entity cannot be found,the subnetwork is isolated.

• If a SA value of a subnetwork goes under a given thresh-old this network is isolated. It means that all the entityof the network must discard the data packet emanatingfrom the subnetwork with the low SA value.

Fig. 1: The immune system

The thresholds of the SA values below which the SecurityPolicy requires a reaction have to be experimentally deter-mined by calibrating the sensors of the monitored CI duringa period where no attacks or no failures are encountered. TheFigure 1 illustrates the applications of the rules of the SecurityPolicy according to the SA value computed by the agents.

B. Experimentation on the Telecom ParisTech network

The immune system has been implemented and deployedon the Computer Sciences and Networking department net-work at TELECOM ParisTech. This network has around50 workstations, several CISCO routers and is divided intoseveral subnetworks. In this experimentation, we have usedtrue sensors except for the Virus Sensor since we do not wantto really propagate virus or worms on this operational network.We are able to simulate the existence of a worm attack or thefailure of a network element.

Each workstation and servers of the monitored network actsas a node of ROSA and a sensor agent of the monitoringcomponent some of them are also elected as computation andsecurity agents. All the communications between the agentstransit through ROSA, see Section III. In order to completethe monitoring tool and allow some demonstrations we add asecurity cockpit that consists in a java applet that display arepresentation of the monitored network, see Figure 2.

Fig. 2: The security cockpit

III. ROSA

A. Overlay networks and resiliency

A lot of Overlay Networks have already dealt with theproblem of resiliency. The most famous is undoubtedly RON(Resilient Overlay Network) [1]. But RON lack of scalability,indeed RON cannot exceed more than one hundred of nodes.

To improve RON scalability, the solution chosen by DG-RON [2] consists in splitting the network into logical zones.But this solution does not take into account of the specificitiesof the topology of the underlying network.

A solution proposed in [3] consists in exploring the under-lying network topology in order to make the nodes efficientlychoose these neighbors. Nevertheless, the algorithm proposedcan only be used to construct static network.

B. ROSA principle

From these observations, we decided make the node ofROSA to dynamically reorganizes their neighbors sets accord-ing to:

• the topology of the network which ROSA is deployedon ;

• the failures occuring on the elements of the links betweentwo nodes.

This way despite the failures occurring on the underlyingnetwork layer there is a high probability that a path exists inthe ROSA layer.

In ROSA nodes are organized in cluster called lumps.A lump is a set of fully connected nodes. ROSA can berepresented by a entanglement of lumps. Each node of ROSAbelongs to at least one of the lumps and in order to be scalablethe size of a lump and the number of node to which a nodecan belong are bounded.

Each of the lumps of ROSA is associated with a metriccalled density. Let l be a lump, the density of a l is thequantification of its capacity to maintain a path between allthe nodes that compose it, despite the presence of virtual linkfailures. The density is defined as it follows:

The density is the minimal number of failures on theelements of the virtual links of the lump that are necessary

to isolate a node of the lump.

This way, if the number of failures is less than the density, wecan affirm that there exist a path between any two nodes of thelump and that the nodes of the lump still able to communicateeach other.

ROSA is endowed of two mechanisms, the first one isa failures detections mechanism. Each node of ROSA mustperiodically sends to its neighbors the lump with the lowestdensity to which it belongs. These messages have a dual role:

• they propagate the knowledge about the lump with lowdensities ;

• they allow any node to detect that a virtual link is broken.

A node considers that a virtual link is broken if it does notreceive such message from one of its neighbors. The nodereorganizes the ROSA set of lumps according to this failures.This is schematized in the Figure 3.

Fig. 3: The detection of a broken link

The second mechanism of ROSA consists in the enhance-ment of the global density; each node try to increase thedensity of a lump by joining it and leave another lumpif necessary. A node does this only if the local density isincreased, i.e. the minimal density of the lumps to which thenode belongs. A node joins a lump only if it does not implythat it has to leave a lump with a density less or equal thanthe density of the lump to join.

With this mechanism ROSA reorganizes in a scalable wayits topology and enhance its global density, and consequentlyits failures tolerance.

C. Routing over ROSA

ROSA is an unstructured network, the routing of messagesis based on path discovery with reliable flooding or random-walk algorithms. However to obtain a clever routing of themessages it is possible to organize the entanglements of lumpsinto a DHT. This DHT will be called the chain of lumps.

This DHT possesses resiliency properties and use a small-world phenomenon in order to reduce the number of hops

Fig. 4: The transformation of the entanglement of lumps intoa chain of lumps.

needed to route messages between the nodes. Readers inter-ested by ROSA can refer to our previous works: [4][5][6][7].

IV. SECURITY ASSURANCE VALUE

The immune system must detect quickly the attacks, failuresor services bugs, it implies that the SA values of the differententities must be permanently recomputed by the computationagents.

The direct measurement of SA value of a network entity isdesirable, but not always directly possible. In some cases, thenetwork entity has to be decomposed into network subentities.Then the SA value of the network entity is computed from theSA values of the subentites obtained by the decomposition ofthe initial entity using aggregation methods. An aggregationmethod combines the SA values of the subentities by takingaccount of the relations between these subentities to obtainthe desired SA value. Consequently, the computation of theSA value of an entire network consists in 5 steps that are:

• Modelling. It consists in decomposing the network intoirreducible entities.

• Metrics assignment. It consists in determining whichmetrics has to be observed to compute the SA valuesof each kind of entities.

• Measurement. It consists in measuring the SA values ofthe irreducible entities.

• Aggregation. It consists in computing the SA values ofnon irreducible entities using the aggregation methodsand the already computed SA values.

• Evaluation and Interpretation. It consists in computingthe SA values of the entire network and interpreting theresulting values to determine the overall assurance levelof the network.

A. SA value of the network components

In this section we present how the SA value of a irreducibleentity, the workstation and a reducible one, the subnetwork arecomputed

Each workstation computes its own SAV according to theformula:

SAVlocal = min(CPU Utilization,

Used disk space,

Allocated memory) ×min(Number of process,

Number of users)

and the formula:

SAVnetwork=min(Number of input packets,

Input packet error rate,

Number of output packets,

Output packet error rate,

Packet collision rate,

Number of possible routes,

Number of links to server) ×min(Number of TCP connections,

Number of UDP connections,

Number of discarded datagrams,

Number of UDP datagrams)

The SAV of a sub network is computed according to theformula:

SAVsubnet = min(Average workstations,

Average perimeter servers and router,

SAVnetwork )

where Average workstations is the average value of SA valuesof all workstations, Average perimeter servers and router isthe average value of SA values of all the servers and routersbelonging to the subnetwork.The SAVnetwork is computed as it follows:

SAVnetwork = min(Input packet rate,

Input packet error rate,

Output packet rate,

Output packet error rate,

Packet collision rate)

In order to compute the SAV of a sub network, a workstation(or a server) needs to have the SA values of all the worksta-tions and servers that belong to the subnetwork.

Readers interested by the computation of the SAV of theother entities may read [8].

V. CONCLUSION

To our knowledge there is no system for ensuring thedefense of a critical infracstructure similar to the immunesystem proposed in this article.

All the existing network monitoring tools and networkdetection intrusion systems (NIDS) are not suitable to protecta critical infrastructure for many reasons. First reason, thesetools do not got an automated response to the detected threats.The second reason is that all these tools are not able to resiststo attacks or failures because these tools are not deployed ona resilient platform. Even DOMINO [9], that yet is a NIDSdeployed over an overlay network, does not take benefits ofthe potential of the overlay networks to ensure it own security.

But the major reason is that all the existing tools onlytake into account the network aspect, neglecting the differentother important aspects (services failures, strange or maliciousbehavior, bugs, etc). That is why we think that only systemsbased on Security Assurance Values can ensure the defenseof a critical infrastructures and today there is no implementedtool able to compute and use SA Values to such a purpose.

To conclude we can say that the immune system proposedin this article fills the gaps of the existing and implementedtools by:

• using a distributed and resilient platform that protects theimmune system from outer and inner attacks ;

• being fully automated and in this way able to react veryquickly to threats ;

• using Security Assurance values in real-time for detectingthreats occuring of the critical infrastructure.

Our immune system has also been tested and validated on areal network.

REFERENCES

[1] David G. Andersen and Hari Balakrishnan and G. Andersen, ResilientOverlay Networks, Symposium on Operating Systems Principles, 2001,131–145

[2] Sameer Qazi and Tim Moors, Scalable resilient overlay networks usingdestination-guided detouring, Proceedings of the IEEE International Con-ference on Communications (ICC), 2007, 428–434

[3] Nakao Akihiro and Peterson Larry and Bavier Andy, Scalable routingoverlay networks, SIGOPS Oper. Syst. Rev., 2006, 40, 49–61

[4] Robust Overlay Network with Self-Adaptive, L. Baud and N. Pham and P.Bellot, The 2008 IEEE International Conference on Research, Innovationand Vision for the Future (RIVF 2008), Ho Chi Minh City, Viet Nam,2008

[5] Robust Overlay Network with Self-Adaptive Topology: The Reliable FileStorage Layer, L. Baud, The 2009 IEEE - RIVF International Conferenceon Computing and Communication Technologies, Da Nang, Viet Nam,2009

[6] Robust Overlay Network with Self-Adaptive Topology: The chain oflumps structure , L. Baud and P. Bellot, The 2009 International Workshopon Peer-To-Peer Networking, St. Pertersburg, Russia, 2009

[7] , The ROSA Protocol Adapted to Aeronautical Mobile Ad-Hoc Network,L. Baud and P. Bellot, The 8th Innovative Research Workshop & Exhibition(INO2009), Brétigny sur Orge, France, 2009

[8] Nguyen Pham and Michel Riguidel, Security Assurance Aggregation forIT Infrastructures, ICSNC ’07: Proceedings of the Second InternationalConference on Systems and Networks Communications, 2007, 72

[9] Global Intrusion Detection in the DOMINO Overlay System, V. Yeg-neswaran and P. Barford and S. Jha, NDSS ’04: Proceedings of Networkand Distributed System Security Symposium , 2004

[ieee communication technologies, research, innovation, and vision for the future (rivf) - hanoi,...

Documents